Issues
- 1
The llama-2-7b model can't quant in this code
#93 opened by Hzqskywkr - 0
Obtained different PPL for Wikitext and C4 compared to results reported in the paper
#95 opened by yc2367 - 0
Performance gap with Llama-2-7B
#94 opened by Xzk7 - 4
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument mat2 in method wrapper_CUDA_bmm)
#89 opened by mcpaulgeorge - 2
Error when evaluating MMLU
#91 opened by zjq0455 - 0
How to generalize LET to llama3?
#92 opened by zjq0455 - 0
how to enable llama3-8b int4 awq models
#90 opened by FlexLaughing - 0
The version of transformers, auto_gptq, autoawq
#88 opened by zhangfzR - 5
Llama-3-8B
#75 opened by hsb1995 - 0
- 0
[New Feature] Seek MLA Supported by Smooth
#86 opened by RanchiZhao - 0
question about let
#85 opened by mxjmtxrm - 0
[Model Request] MiniCPM
#84 opened by RanchiZhao - 6
The ckpt of Quantized OPT model is not be found
#53 opened by liuxy1103 - 4
- 2
- 1
- 0
Questions about quantization
#81 opened by mxjmtxrm - 0
Questions about quantization
#82 opened by mxjmtxrm - 3
- 3
- 1
Which bug do you fix for auto_gptq
#79 opened by BaohaoLiao - 2
- 0
- 0
OPT-30B
#76 opened by Arthur-Ling - 5
Is activation get quantized on-the-fly?
#74 opened by XA23i - 1
- 2
CUDA extension not installed
#62 opened by Arthur-Ling - 7
Checksums didn't match for dataset source files
#65 opened by hsb1995 - 1
- 4
W4A4 in llama2-7b
#70 opened by chenzx921020 - 1
Why is the compressed file one file instead of the pre trained weights, where there are many files for training the mode
#73 opened by hsb1995 - 0
TypeError: FalconRotaryEmbedding.forward() missing 1 required positional argument: position_ids
#72 opened by luchangli03 - 1
[Model Request] upstage/SOLAR-10.7B-v1.0
#45 opened by joseph777111 - 7
AutoGPTQ or AutoGPTQ-bugfix?
#57 opened by Alvant - 3
RuntimeError: Trying to backward through the graph a second time (or directly access saved tensors after they have already been freed).
#64 opened by zkf331 - 1
Other Task
#67 opened by hsb1995 - 1
potential bug about matmul quantization process?
#38 opened by brisker - 2
- 3
- 2
OPT Model Reproduction Discrepancies
#63 opened by fantasysee - 9
reproduce evaluation results
#60 opened by oujieww - 2
License
#55 opened by fakerybakery - 2
- 4
[Llama-2-7B-chat] ppl of w4a8 is nan
#51 opened by xingchensong - 3
- 1
TypeError: QuantLlamaDecoderLayer.forward() got an unexpected keyword argument 'padding_mask'
#44 opened by xianwujie - 1
general question about LLM kv-cache quantization
#41 opened by brisker - 3
[Model Request] Mixtral-8x7B-v0.1
#40 opened by joseph777111 - 0