mobiusml/hqq

Official implementation of Half-Quadratic Quantization (HQQ)

PythonApache-2.0

Issues

No benefit from batch inference.
#132 opened 5 months ago by Emily-Ward
4
Support for HQQ Quantization: Compatibility with LLava and Qwen Models?
#125 opened 5 months ago by NEWbie0709
10
Hqq vs gguf
#118 opened 5 months ago by blap
3
CUDA error when trying to use llama3.1 8B 4bit quantized model sample
#120 opened 5 months ago by PatrickDahlin
8
KeyError: 'offload_meta'
#122 opened 5 months ago by kadirnar
1
cache_size_limit reached
#129 opened 5 months ago by zhangy659
22
4bit slower?
#128 opened 5 months ago by zhangy659
3
Activation quantization
#86 opened 5 months ago by kaizizzzzzz
9
integrated into gpt-fast
#119 opened 5 months ago by kaizizzzzzz
1
8bit + Aten + compile
#130 opened 5 months ago by zhangy659
6
Group size and restrictions: documentation and implementation contradict each other
#124 opened 5 months ago by Maykeye
5
slow loading process of pretrained model for finetuning in transformers
#123 opened 5 months ago by jiaqiw09
4
question about fine tune 1bit-quanitzed model
#115 opened 7 months ago by zxbjushuai
35
TypeError: Object of type dtype is not JSON serializable
#107 opened 7 months ago by zxbjushuai
11
Issue when loading the quantized model
#114 opened 7 months ago by NEWbie0709
5
Question about Quantization
#113 opened 7 months ago by NEWbie0709
4
Weight Sharding
#100 opened 7 months ago by winglian
2
Quesiton on the speed for generating the response
#111 opened 7 months ago by NEWbie0709
18
RuntimeError: Expected in.dtype() == at::kInt to be true, but got false.
#108 opened 7 months ago by egorsmkv
9
`hqq/backends/torchao.py` line 177, KeyError: 'scale'
#110 opened 7 months ago by egorsmkv
13
zero and scale quant
#109 opened 7 months ago by kaizizzzzzz
1
Warning: failed to import the BitBlas backend
#105 opened 7 months ago by jinz2014
7
Expected in.dtype() == at::kInt to be true, but got false
#103 opened 7 months ago by jonashaag
14
Easy way to run lm evaluation harness
#104 opened 7 months ago by pythonLoader
1
hqq+ lora ValueError || ValueError: Unable to create tensor, you should probably activate truncation and/or padding with 'padding=True' 'truncation=True'
#85 opened 8 months ago by tellyoung
3
Bug of the saved model when applying zero and scale quantization
#102 opened 8 months ago by kaizizzzzzz
1
Support Gemma quantization
#101 opened 8 months ago by kaizizzzzzz
2
RuntimeError: Expected in.dtype() == at::kInt to be true, but got false. (Could this error message be improved? If so, please report an enhancement request to PyTorch.)
#99 opened 8 months ago by kadirnar
1
3-bit quantization weight data type issue
#97 opened 8 months ago by BeichenHuang
10
About the implentation of .cpu()
#96 opened 8 months ago by reflectionie
1
bitblas introduces dependency on CUDA version
#94 opened 8 months ago by zodiacg
3
OSError: libnvrtc.so.12: cannot open shared object file: No such file or directory
#95 opened 8 months ago by kadirnar
1
module 'torch.library' has no attribute 'custom_op'
#92 opened 8 months ago by fahadh4ilyas
4
2-bit quantization representation
#90 opened 9 months ago by kaizizzzzzz
3
1 bit inference
#88 opened 9 months ago by kaizizzzzzz
4
Weird problem in loading quantized_model + lora_adpter
#89 opened 9 months ago by kaizizzzzzz
0
Group_Size setting
#87 opened 9 months ago by kaizizzzzzz
1
Is HQQLinearLoRAWithFakeQuant differentiable?
#84 opened 9 months ago by lippman1125
1
Question about quantization.
#83 opened 10 months ago by mxjmtxrm
2
AttributeError: 'HQQLinearTorchWeightOnlynt4' object has no attribute 'weight'
#81 opened 10 months ago by ChuanhongLi
7
prepare_for_inference error
#77 opened 10 months ago by BeichenHuang
17
Running HQQ Quantized Models on CPU
#82 opened 10 months ago by 49Simon
3
[Question] Model Outputting Gibberish After Quantization
#80 opened 10 months ago by DefinitlyEvil
4
AttributeError: 'LlamaForCausalLM' object has no attribute '_setup_cache'
#79 opened 10 months ago by ChuanhongLi
3
HQQ for convolutional layers
#78 opened 10 months ago by danishansari
6
Not able to save quantized model
#75 opened 10 months ago by BeichenHuang
5
No module named 'hqq.engine' Error.
#76 opened 10 months ago by yixuantt
2
Can the quantization process be on CPU?
#74 opened a year ago by mxjmtxrm
4
Does it support Hqq optimization algorithm in diffusion models?
#73 opened a year ago by kadirnar
1
Compatibility Issue: TypeError for Union Type Hints with Python Versions Below 3.10
#72 opened a year ago by hjh0119
1