yiliu30's Stars
xai-org/grok-1
Grok open release
karpathy/nanoGPT
The simplest, fastest repository for training/finetuning medium-sized GPTs.
facebookresearch/faiss
A library for efficient similarity search and clustering of dense vectors.
meta-llama/llama3
The official Meta Llama 3 GitHub site
karpathy/llm.c
LLM training in simple, raw C/CUDA
karpathy/nn-zero-to-hero
Neural Networks: Zero to Hero
ai-boost/awesome-prompts
Curated list of chatgpt prompts from the top-rated GPTs in the GPTs Store. Prompt Engineering, prompt attack & prompt protect. Advanced Prompt Engineering papers.
pytorch/torchtune
PyTorch native finetuning library
openai/transformer-debugger
ROCm/HIP
HIP: C++ Heterogeneous-Compute Interface for Portability
pytorch/executorch
On-device AI across mobile, embedded and edge for PyTorch
pytorch/ao
PyTorch native quantization and sparsity for training and inference
pytorch/functorch
functorch is JAX-like composable function transforms for PyTorch.
huggingface/optimum-quanto
A pytorch quantization backend for optimum
OpenGVLab/OmniQuant
[ICLR2024 spotlight] OmniQuant is a simple and powerful quantization technique for LLMs.
SqueezeAILab/SqueezeLLM
[ICML 2024] SqueezeLLM: Dense-and-Sparse Quantization
tspeterkim/flash-attention-minimal
Flash Attention in ~100 lines of CUDA (forward pass only)
IST-DASLab/marlin
FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.
Vahe1994/SpQR
google-research/sputnik
A library of GPU kernels for sparse matrix operations.
yxli2123/LoftQ
Aaronhuang-778/BiLLM
(ICML 2024) BiLLM: Pushing the Limit of Post-Training Quantization for LLMs
puttsk/cuda-tutorial
A set of hands-on tutorials for CUDA programming
IST-DASLab/QUIK
Repository for the QUIK project, enabling the use of 4bit kernels for generative inference - EMNLP 2024
xijiu9/Train_Transformers_with_INT4
thu-nics/qllm-eval
Code Repository of Evaluating Quantized Large Language Models
sunlex0717/DissectingTensorCores
iree-org/iree-torch
Torch Frontend for IREE
facebookexperimental/protoquant
Prototype routines for GPU quantization written using PyTorch.
Quansight/torch-build
Collection of scripts to build PyTorch and the domain libraries from source.