SunMarc's Stars
mlabonne/llm-course
Course to get into Large Language Models (LLMs) with roadmaps and Colab notebooks.
karpathy/LLM101n
LLM101n: Let's build a Storyteller
karpathy/llm.c
LLM training in simple, raw C/CUDA
microsoft/unilm
Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
apple/ml-ferret
huggingface/lerobot
🤗 LeRobot: Making AI for Robotics more accessible with end-to-end learning
NVIDIA/FasterTransformer
Transformer related optimization, including BERT, GPT
pytorch/torchtitan
A native PyTorch Library for large model training
huggingface/huggingface_hub
The official Python client for the Huggingface Hub.
huggingface/datatrove
Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.
microsoft/DeepSpeed-MII
MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.
huggingface/cookbook
Open-source AI cookbook
intel/intel-extension-for-pytorch
A Python package for extending the official PyTorch that can easily obtain performance on Intel platform
cuda-mode/resource-stream
CUDA related news and material links
IST-DASLab/marlin
FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.
mit-han-lab/qserve
QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving
huggingface/local-gemma
Gemma 2 optimized for your local machine.
SqueezeAILab/KVQuant
[NeurIPS 2024] KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization
spcl/QuaRot
Code for QuaRot, an end-to-end 4-bit inference of large language models.
jy-yuan/KIVI
KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache
muellerzr/minimal-trainer-zoo
Minimal example scripts of the Hugging Face Trainer, focused on staying under 150 lines
NetEase-FuXi/EETQ
Easy and Efficient Quantization for Transformers
VITA-Group/Q-GaLore
Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients.
neuralmagic/AutoFP8
mit-han-lab/lmquant
exo-explore/mlx-bitnet
1.58 Bit LLM on Apple Silicon using MLX
LiqunMa/FBI-LLM
FBI-LLM: Scaling Up Fully Binarized LLMs from Scratch via Autoregressive Distillation
neuralmagic/compressed-tensors
A safetensors extension to efficiently store sparse quantized tensors on disk
aredden/torch-bnb-fp4
Faster Pytorch bitsandbytes 4bit fp4 nn.Linear ops
muellerzr/import-timer
Pragmatic approach to parsing import profiles for CI's