OpenGVLab/OmniQuant

[ICLR2024 spotlight] OmniQuant is a simple and powerful quantization technique for LLMs.

PythonMIT

Issues

The llama-2-7b model can't quant in this code
#93 opened a month ago by Hzqskywkr
1
Obtained different PPL for Wikitext and C4 compared to results reported in the paper
#95 opened 20 days ago by yc2367
0
Performance gap with Llama-2-7B
#94 opened 21 days ago by Xzk7
0
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument mat2 in method wrapper_CUDA_bmm)
#89 opened 2 months ago by mcpaulgeorge
4
Error when evaluating MMLU
#91 opened a month ago by zjq0455
2
How to generalize LET to llama3?
#92 opened a month ago by zjq0455
0
how to enable llama3-8b int4 awq models
#90 opened a month ago by FlexLaughing
0
The version of transformers, auto_gptq, autoawq
#88 opened 2 months ago by zhangfzR
0
Llama-3-8B
#75 opened 5 months ago by hsb1995
5
Quantify tinyllama-1.1B-Chat-v1.0, a CUDA assertion error occurred
#87 opened 3 months ago by causeof-you
0
[New Feature] Seek MLA Supported by Smooth
#86 opened 3 months ago by RanchiZhao
0
question about let
#85 opened 3 months ago by mxjmtxrm
0
[Model Request] MiniCPM
#84 opened 3 months ago by RanchiZhao
0
The ckpt of Quantized OPT model is not be found
#53 opened 9 months ago by liuxy1103
6
When reproducing evaluation results for Llama-2-13b w4a4, I got nan
#69 opened 3 months ago by NewDriverLee
4
The llama-1-65b model seems unstable in this code
#83 opened 3 months ago by Xingrun-Xing
2
AttributeError: 'FalconAttention' object has no attribute 'maybe_rotary'
#71 opened 6 months ago by luchangli03
1
Questions about quantization
#81 opened 4 months ago by mxjmtxrm
0
Questions about quantization
#82 opened 4 months ago by mxjmtxrm
0
How to accelerate the inference speed with real_quant
#80 opened 4 months ago by j2kim99
3
Questions regarding Infusing Omniquant into MLC
#77 opened 5 months ago by BuildBackBuehler
3
Which bug do you fix for auto_gptq
#79 opened 4 months ago by BaohaoLiao
1
KeyError: 'Cache only has 0 layers, attempted to access layer with index 0'
#68 opened 5 months ago by zfstr
2
Some questions about the results of weight only quantification in the paper
#78 opened 5 months ago by everloom
0
OPT-30B
#76 opened 5 months ago by Arthur-Ling
0
Is activation get quantized on-the-fly?
#74 opened 5 months ago by XA23i
5
Difference between fake quant and real quant
#61 opened 5 months ago by YihengBrianWu
1
CUDA extension not installed
#62 opened 5 months ago by Arthur-Ling
2
Checksums didn't match for dataset source files
#65 opened 5 months ago by hsb1995
7
seq_len is deprecated and unused in transformers>=4.38.0
#66 opened 5 months ago by Lokshaw-Chau
1
W4A4 in llama2-7b
#70 opened 5 months ago by chenzx921020
4
Why is the compressed file one file instead of the pre trained weights, where there are many files for training the mode
#73 opened 6 months ago by hsb1995
1
TypeError: FalconRotaryEmbedding.forward() missing 1 required positional argument: position_ids
#72 opened 6 months ago by luchangli03
0
[Model Request] upstage/SOLAR-10.7B-v1.0
#45 opened 6 months ago by joseph777111
1
AutoGPTQ or AutoGPTQ-bugfix?
#57 opened 8 months ago by Alvant
7
RuntimeError: Trying to backward through the graph a second time (or directly access saved tensors after they have already been freed).
#64 opened 7 months ago by zkf331
3
Other Task
#67 opened 6 months ago by hsb1995
1
potential bug about matmul quantization process?
#38 opened 7 months ago by brisker
1
How to properly evaluate W6A6 models using checkpoint from the mode zoo
#59 opened 8 months ago by ChengZhang-98
2
How to use AutoGPTQ to achieve real quantization?
#50 opened 7 months ago by AboveParadise
3
OPT Model Reproduction Discrepancies
#63 opened 7 months ago by fantasysee
2
reproduce evaluation results
#60 opened 8 months ago by oujieww
9
License
#55 opened 8 months ago by fakerybakery
2
Quantize Llama-2-Chat Models with Weights and Activation-Quantization
#52 opened 9 months ago by DRXD1000
2
[Llama-2-7B-chat] ppl of w4a8 is nan
#51 opened 9 months ago by xingchensong
4
attention_mask may appear None for newer versions of LLaMA?
#46 opened 9 months ago by Alvant
3
TypeError: QuantLlamaDecoderLayer.forward() got an unexpected keyword argument 'padding_mask'
#44 opened 9 months ago by xianwujie
1
general question about LLM kv-cache quantization
#41 opened 9 months ago by brisker
1
[Model Request] Mixtral-8x7B-v0.1
#40 opened 9 months ago by joseph777111
3
AttributeError: 'Attention' object has no attribute 'W_pack'
#39 opened 10 months ago by yrf200112
0