efeslab/Atom

[MLSys'24] Atom: Low-bit Quantization for Efficient and Accurate LLM Serving

Cuda

Issues

kernel optimized for A100
#25 opened 2 months ago by lisuying214
3
e2e demonstration for bigger models
#24 opened 3 months ago by jiwonsong-dev
1
Question about KV Cache quantization
#23 opened 4 months ago by SherrySwift
3
Quention about end-to-end efficiency evaluation of Atom
#22 opened 5 months ago by cokeshao
2
Is it possible to add support for other models?
#21 opened 5 months ago by wlll123456
1
Question about the synchronazation in low-precision kernel
#20 opened 5 months ago by cat538
2
TypeError: QLlamaDecoderLayer.forward() got an unexpected keyword argument 'cache_position'
#19 opened 5 months ago by galenyu
2
LLM model load hanging problem
#18 opened 5 months ago by jimmy-adams
2
Question regarding the efficiency evaluation
#17 opened 7 months ago by FlyFoxPlayer
3
The question about calib data
#15 opened 9 months ago
3
How to load quantized weight?
#14 opened 9 months ago
1
RuntimeError when quant llama model
#12 opened 9 months ago
10
AssertionError
#9 opened 10 months ago by muzi0111
1
error：same device
#8 opened 10 months ago by muzi0111
1
the ppl for llama-7b is very large
#7 opened 10 months ago by priscilla-pan
3
not including dynamic quantizaiton when reproducing results, why?
#6 opened 10 months ago by priscilla-pan
3
how to compare the performance with vllm/tgit/lightllm or other llm serving framework?
#4 opened a year ago by irasin
3
ppl on ptb
#3 opened a year ago by MrDoghead
2
issue with `c4` dataset for eval
#2 opened a year ago by HamidShojanazeri
1