Issues
- 3
kernel optimized for A100
#25 opened by lisuying214 - 1
e2e demonstration for bigger models
#24 opened by jiwonsong-dev - 3
Question about KV Cache quantization
#23 opened by SherrySwift - 2
- 1
- 2
- 2
TypeError: QLlamaDecoderLayer.forward() got an unexpected keyword argument 'cache_position'
#19 opened by galenyu - 2
LLM model load hanging problem
#18 opened by jimmy-adams - 3
Question regarding the efficiency evaluation
#17 opened by FlyFoxPlayer - 3
The question about calib data
#15 opened - 1
How to load quantized weight?
#14 opened - 10
RuntimeError when quant llama model
#12 opened - 1
AssertionError
#9 opened by muzi0111 - 1
error:same device
#8 opened by muzi0111 - 3
the ppl for llama-7b is very large
#7 opened by priscilla-pan - 3
- 3
how to compare the performance with vllm/tgit/lightllm or other llm serving framework?
#4 opened by irasin - 2
ppl on ptb
#3 opened by MrDoghead - 1
issue with `c4` dataset for eval
#2 opened by HamidShojanazeri