casper-hansen/AutoAWQ

AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation:

PythonMIT

Issues

assert self.in_features % self.group_size == 0
#602 opened 4 months ago by LDLINGLINGLING
2
Unable to Run on Colab
#629 opened 3 months ago by JosephGatto
1
request: update prereq list to show supported python versions
#569 opened 5 months ago by AartBluestoke
3
Is it possible to remove fixed torch version?
#580 opened 5 months ago by qinxuye
5
Model quantize error
#598 opened 4 months ago by sailfish009
3
cant import awq
#559 opened 5 months ago by Dujianhua1008
7
How is Llava quantized ?
#621 opened 4 months ago by Abhranta
4
Calibration on Custom data for multimodel [image + text]
#594 opened 3 months ago by Skyseaee
0
Will t5 support be included?
#622 opened 4 months ago by Jason202268
1
Support 3-bit and 2-bit quantization with the FLUTE kernel.
#564 opened 5 months ago by radi-cho
4
problem in gemma 2 27b
#627 opened 3 months ago by Alireza3242
0
How to Split AWQ Weights?
#626 opened 3 months ago by Azure-Tang
0
Model Support
#625 opened 3 months ago by SinanAkkoyun
1
Why do you handle the dataset in this way？
#619 opened 4 months ago by lzcchl
1
Can AutoAWQ support W8A16 quantization?
#624 opened 4 months ago by wangzhongren-code
0
NotImplementedError: Only 4-bit are supported for now.
#575 opened 5 months ago by HRuii1
2
MMLU eval failed in ROCM
#623 opened 4 months ago by chunniunai220ml
0
TypeError: internvl_chat isn't supported yet.
#618 opened 4 months ago by Jeremy-J-J
0
Reads out FP16 parameters after quantization
#617 opened 4 months ago by Jason202268
0
Can autoAWQ be used by hiascend's npu?
#616 opened 4 months ago by fanacio
0
where is convert_awq_to_npu.py
#615 opened 4 months ago by fanacio
0
Compressing quantized weight
#614 opened 4 months ago by laiviet
0
Continuous batching
#613 opened 4 months ago by SinanAkkoyun
0
Can you give me some advices about parameters setting?
#612 opened 4 months ago by lzcchl
1
`get_best_device()` can lead to CUDA OOM
#611 opened 4 months ago by johannaSommer
0
the efficient of duo_scaling
#609 opened 4 months ago by Skyseaee
0
7B
#600 opened 4 months ago by Soulscb
4
Inference speed becomes slower after awq(DeepSeek-Coder-V2-Lit-Base)
#606 opened 4 months ago by jli943
4
hello, i have UserWarning: AutoAWQ could not load GEMM kernels extension, is it means gemm that not take effect
#579 opened 5 months ago by wy200507030
1
use autoawq quantizing Qwen2-72B-Instruct error
#577 opened 5 months ago by ving666
3
Multi-Node Quantization using Ray?
#601 opened 4 months ago by paolovic
2
After AWQ quantization, the output cannot be stopped normally
#591 opened 5 months ago by angelOnly
2
support torch (2.4.0)
#590 opened 4 months ago by stenreijers
2
Does AutoAWQ support multi-threading CPU?
#597 opened 5 months ago by sdecoder
0
Excellent Cooperation Opportunities with a Top AI Platform
#592 opened 5 months ago by Mason-one
0
modules_to_not_convert cannot be stored in the checkpoint
#589 opened 5 months ago by guozhiyu
0
Quantize a 70B model on a 80GB VRAM VM
#585 opened 5 months ago by carstendraschner
1
Possible unnecessary line in quantizer.py
#588 opened 5 months ago by Ali-Flt
0
Question regarding scaling activations in _load_quantized_modules method
#587 opened 5 months ago by Ali-Flt
2
RuntimeError: CUDA error: no kernel image is available for execution on the device
#584 opened 5 months ago by noaebbot
1
Converting finetuned Llama 3.1 using LORA into AWQ
#583 opened 5 months ago by fusesid
2
not support group_size=-1
#578 opened 5 months ago by weiwei567
0
error when quantizing my finetuned 405b model using autoawq
#571 opened 5 months ago by Atomheart-Father
7
Error when quantizing the Qwen2-7B model
#574 opened 5 months ago by XiaoYu2022
1
What‘s the difference between llm-awq and this？
#563 opened 5 months ago by LiMa-cas
1
Slowed Down After Quantizing Fine-Tuned gemme-2b-it Model
#572 opened 5 months ago by Yimjaehyun93
0
AutoAWQ models like qwen2, mistral and aquila, the Fuser class should pass the `rope_theta` argument when initializing the LlamaLikeBlock.
#567 opened 5 months ago by Shuai-Xie
0
about the shape of qzeros in awq quantization model
#566 opened 5 months ago by MuYu-zhi
2
ImportError: cannot import name 'initialize_tasks' from 'lm_eval.tasks'
#565 opened 5 months ago by kunzeng-ch
1
Memory-efficient quantization: Load and quantize layer by layer
#561 opened 5 months ago by casper-hansen
0