spcl/QuaRot

Code for Neurips24 paper: QuaRot, an end-to-end 4-bit inference of large language models.

PythonApache-2.0

Issues

question about Hadamard dimension
#44 opened a month ago by mxjmtxrm
1
How is perplexity calculated with the KV cache?
#42 opened a month ago by tsengalb99
1
Reproducing paper Table 8
#43 opened a month ago by mjyun01
1
Question about rotation.
#21 opened 5 months ago by mxjmtxrm
3
[Inference speed] Speed up on prefilling stage, slow down on decoding stage
#38 opened 3 months ago by ChenMnZ
3
[question] Is it possible to quantize Mixtral?
#6 opened 7 months ago by accupham
3
[Q] Having not matched size Hadamard matrix
#41 opened 3 months ago by Coco58323
5
apply_exact_had_to_linear for v_proj.bias if v_proj.bias is not None
#40 opened 3 months ago by dyou-dev
1
A question regarding the rotation matching pairs
#36 opened 3 months ago by Menace-Dragon
1
questions about the rotate
#39 opened 3 months ago by Gloria2tt
1
Accuracy drop after `fuse_layer_norms`
#34 opened 4 months ago by Niko-zyf
1
[Small Bug] The embedding fusion is not necessary for LLaMA models.
#7 opened 7 months ago by ChenMnZ
6
Inference
#37 opened 3 months ago by zhentingqi
2
Question about whether it is necessary to fuse layernorm to linear
#8 opened 6 months ago by Oliver-ss
14
opt model ppl bug
#12 opened 7 months ago by zhsky2017
3
When is online Hadamard applied during evaluation?
#32 opened 4 months ago by pavelgolikov
1
Mistral support
#35 opened 3 months ago by DavidePaglieri
1
mlp_sizes seem wrong in qlinear_benchmark.py
#33 opened 3 months ago by yyfcc17
4
args.distribute_model seems to be undefined
#31 opened 4 months ago by WeiMa01
3
Questions about reproduction of weight-only quantization.
#3 opened 7 months ago by ChenMnZ
6
Outputs of OPT models become different after fusing LayerNorm.
#30 opened 4 months ago by SShock92
3
Other quantization results of rotated model
#25 opened 4 months ago by mxjmtxrm
8
opt model with layernorm, the input of layernorm can use hadamard transform?
#29 opened 4 months ago by JiangYongYu1
4
How to deal with GQA?
#20 opened 5 months ago by mxjmtxrm
1
Relations with SpinQuant?
#28 opened 4 months ago by RanchiZhao
3
How to get models with only offline rotation (or models for weight-only quantization)
#24 opened 5 months ago by Tracin
6
accuracy of weight only quantization decrease significantly after weight rotation
#22 opened 5 months ago by luchangli03
12
Does QuaRot only support Llama and OPT style LLM?
#27 opened 5 months ago by NicoNico6
1
Question about Hadamard transformation and outlier reduction
#26 opened 5 months ago by KimythAnly
2
Question about exact_had_to_linear
#23 opened 5 months ago by mxjmtxrm
1
Wrong result obtained in case of w4a16 quantization？
#16 opened 5 months ago by hyx1999
2
multi GPU inference
#19 opened 5 months ago by hensiesp32
1
Questions related to Compile the QuaRot on CPU and Model Saving
#15 opened 5 months ago by HuangOwen
1
Question about reproducing Fig.1
#14 opened 5 months ago by xinghaow99
4
How to get a fake quantized model?
#18 opened 5 months ago by mxjmtxrm
1
Can we directly load a QuaRot-GPTQ quantized model and do lm_eval evaluation?
#13 opened 6 months ago by Shuai-Xie
1
Questions on online quantization
#11 opened 7 months ago by lzhangzz
4
Some questions
#9 opened 7 months ago by catid
1
Online hadamard bug
#10 opened 7 months ago by nailimixaM
0
Do I need to use merge a hadamard matrix into W_v if I only want to do 4 bit KV caching?
#5 opened 7 months ago by YLGH
4
Applying rotation to HuggingFace model
#1 opened 7 months ago by YLGH
12
Question about online hadamard transformation before down-proj and o_proj
#4 opened 7 months ago by ChenMnZ
1