SherrySwift

Pinned Repositories

Atom
[MLSys'24] Atom: Low-bit Quantization for Efficient and Accurate LLM Serving
Language:Cuda286 10 2025
H2O
[NeurIPS'23] H2O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models.
Language:Python413 5 3951
LESS
Language:Python48 1 21
DuQuant
[NeurIPS 2024 Oral🔥] DuQuant: Distributing Outliers via Dual Transformation Makes Stronger Quantized LLMs.
Language:Python136 2 169
deepcompressor
Model Compression Toolbox for Large Language Models and Diffusion Models
Language:Python299 9 3423
qserve
QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving
Language:Python478 8 4028
KVQuant
[NeurIPS 2024] KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization
Language:Python324 12 1729

SherrySwift doesn’t have any repository yet.