mit-han-lab/qserve
QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving
PythonApache-2.0
Issues
- 0
use qserve with tensorrt-llm raise an error
#45 opened by anaivebird - 0
- 1
How to test the accuracy?
#42 opened by lisuying214 - 4
Would this work on consumer hardware and integrated in frameworks like llama.cpp or others?
#5 opened by Mayorc1978 - 0
Does openai compatible server supported?
#43 opened by anaivebird - 0
pip install -e .
#41 opened by lisuying214 - 0
Some questions about VLM quant
#40 opened by hanhanpp - 1
Question about pagedattention
#36 opened by SherrySwift - 1
- 1
LLama-3-8B model dumped by LMQuant in 4w8a set raises errors when running e2e benchmark in QServe.
#29 opened by Patrick-Lew - 1
[inf nan] got `inf`, `nan` or element < 0
#38 opened by yunyipower - 0
How to add new models?
#33 opened by NicolasDrapier - 0
RMSNorm implemented as LayerNorm
#32 opened by jason-huang03 - 0
[New Feature] Will MLA Be Supported?
#28 opened by RanchiZhao - 2
The outpout of given model(mit-han-lab/Llama-3-8B-QServe-g128) is mistaken
#21 opened by haichuan1221 - 0
[New Model Supported] MiniCPM-2B
#24 opened by RanchiZhao - 3
Question about dequantization overhead
#23 opened by DD-DuDa - 0
- 1
Questions about FP8 and H100
#19 opened by sijiac - 2
- 1
Circular import error
#22 opened by LuckyLYM - 1
Is the Table.3 accuracy tested with dequantized weights, or tested on real accelerated quantized kernels?
#17 opened by vovoluck - 1
Expected speed for llama3-70b-instruct
#18 opened by ethxnp - 2
support tp
#14 opened by cyLi-Tiger - 0
has anyone tried to HIPify this for AMD/ROCm
#16 opened by ehartford - 0
fast dequantization in per-ch
#15 opened by yanghaihui - 1
activation quantization
#13 opened by hanhanpp - 2
- 1
Couldn't instantiate the backend tokenizer
#8 opened by Rudin6 - 1
Any performance comparsion with vllm?
#12 opened by MuYu-zhi - 2
Source code
#3 opened by jph00 - 3
Question about the paper
#10 opened by jameswu2014 - 1
Accuracy on Qwen1.5-72B
#9 opened by cyLi-Tiger - 1
- 3
Is 8bit supported?
#2 opened by nivibilla - 1