Issues
- 0
When I use SqueezeLLM to quantize the LLaMA2-13B model and test it, the speed is extremely slow.
#71 opened by zhangfzR - 0
- 0
- 0
Why do LLaMA-2-7B have s0 quantized models, but no s5 and s45 sparsity quantized models?
#68 opened by Evane5cence - 0
Further speeding up the quantization process
#67 opened by SyphonArch - 0
Installation instructions did not lead to the local transformers version being selected, giving errors
#66 opened by RDouglasSharp - 0
Support JAIS models
#65 opened by 7ossam81 - 0
- 0
Dense-only quantization bit precision
#63 opened by akarkim - 2
On A100 card, speed-up effect does not show up.
#51 opened by leocnj - 0
D+S packing in vLLM seems buggy
#62 opened by MingLin-home - 0
- 1
- 0
A question about LLaMA-2-7B and Mistral models only provide Dense-only (0%) quantized models
#56 opened by WeiMa01 - 3
Will It work in V100 GPU ?
#4 opened by Sravanth-k27 - 0
channel-wise quantization
#52 opened by SoyeonUm - 0
Future plan for this project
#45 opened by tjtanaa - 0
Vicuna-1.5?
#44 opened by mlinmg - 2
- 1
finetune SqueezeLLM
#20 opened by kiucho - 9
quantisation implementation
#12 opened by huyphan168 - 2
access to quantisation code
#7 opened by ri938 - 0
Minor bug for --include_sparse
#39 opened by vuiseng9 - 1
Vicuna v1.3
#30 opened by nestordemeure - 1
- 1
Add 65B-q3 evaluation
#5 opened by ingenieroariel - 3
Typos in the README.md
#6 opened by matteoguarrera