casper-hansen/AutoAWQ
AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation:
PythonMIT
Issues
- 2
assert self.in_features % self.group_size == 0
#602 opened by LDLINGLINGLING - 1
Unable to Run on Colab
#629 opened by JosephGatto - 3
- 5
Is it possible to remove fixed torch version?
#580 opened by qinxuye - 3
Model quantize error
#598 opened by sailfish009 - 7
cant import awq
#559 opened by Dujianhua1008 - 4
How is Llava quantized ?
#621 opened by Abhranta - 0
- 1
Will t5 support be included?
#622 opened by Jason202268 - 4
- 0
problem in gemma 2 27b
#627 opened by Alireza3242 - 0
How to Split AWQ Weights?
#626 opened by Azure-Tang - 1
Model Support
#625 opened by SinanAkkoyun - 1
Why do you handle the dataset in this way?
#619 opened by lzcchl - 0
Can AutoAWQ support W8A16 quantization?
#624 opened by wangzhongren-code - 2
NotImplementedError: Only 4-bit are supported for now.
#575 opened by HRuii1 - 0
MMLU eval failed in ROCM
#623 opened by chunniunai220ml - 0
TypeError: internvl_chat isn't supported yet.
#618 opened by Jeremy-J-J - 0
Reads out FP16 parameters after quantization
#617 opened by Jason202268 - 0
Can autoAWQ be used by hiascend's npu?
#616 opened by fanacio - 0
where is convert_awq_to_npu.py
#615 opened by fanacio - 0
Compressing quantized weight
#614 opened by laiviet - 0
Continuous batching
#613 opened by SinanAkkoyun - 1
Can you give me some advices about parameters setting?
#612 opened by lzcchl - 0
`get_best_device()` can lead to CUDA OOM
#611 opened by johannaSommer - 0
the efficient of duo_scaling
#609 opened by Skyseaee - 4
- 4
- 1
hello, i have UserWarning: AutoAWQ could not load GEMM kernels extension, is it means gemm that not take effect
#579 opened by wy200507030 - 3
use autoawq quantizing Qwen2-72B-Instruct error
#577 opened by ving666 - 2
Multi-Node Quantization using Ray?
#601 opened by paolovic - 2
- 2
support torch (2.4.0)
#590 opened by stenreijers - 0
Does AutoAWQ support multi-threading CPU?
#597 opened by sdecoder - 0
- 0
- 1
Quantize a 70B model on a 80GB VRAM VM
#585 opened by carstendraschner - 0
Possible unnecessary line in quantizer.py
#588 opened by Ali-Flt - 2
- 1
RuntimeError: CUDA error: no kernel image is available for execution on the device
#584 opened by noaebbot - 2
Converting finetuned Llama 3.1 using LORA into AWQ
#583 opened by fusesid - 0
not support group_size=-1
#578 opened by weiwei567 - 7
- 1
Error when quantizing the Qwen2-7B model
#574 opened by XiaoYu2022 - 1
What‘s the difference between llm-awq and this?
#563 opened by LiMa-cas - 0
- 0
AutoAWQ models like qwen2, mistral and aquila, the Fuser class should pass the `rope_theta` argument when initializing the LlamaLikeBlock.
#567 opened by Shuai-Xie - 2
about the shape of qzeros in awq quantization model
#566 opened by MuYu-zhi - 1
- 0