Pinned Repositories
Atom
[MLSys'24] Atom: Low-bit Quantization for Efficient and Accurate LLM Serving
H2O
[NeurIPS'23] H2O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models.
LESS
DuQuant
[NeurIPS 2024 Oral🔥] DuQuant: Distributing Outliers via Dual Transformation Makes Stronger Quantized LLMs.
deepcompressor
Model Compression Toolbox for Large Language Models and Diffusion Models
qserve
QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving
KVQuant
[NeurIPS 2024] KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization
SherrySwift's Repositories
SherrySwift doesn’t have any repository yet.