Issues
- 4
No benefit from batch inference.
#132 opened by Emily-Ward - 10
- 3
Hqq vs gguf
#118 opened by blap - 8
- 1
KeyError: 'offload_meta'
#122 opened by kadirnar - 22
cache_size_limit reached
#129 opened by zhangy659 - 3
4bit slower?
#128 opened by zhangy659 - 9
Activation quantization
#86 opened by kaizizzzzzz - 1
integrated into gpt-fast
#119 opened by kaizizzzzzz - 6
8bit + Aten + compile
#130 opened by zhangy659 - 5
Group size and restrictions: documentation and implementation contradict each other
#124 opened by Maykeye - 4
- 35
question about fine tune 1bit-quanitzed model
#115 opened by zxbjushuai - 11
- 5
Issue when loading the quantized model
#114 opened by NEWbie0709 - 4
Question about Quantization
#113 opened by NEWbie0709 - 2
Weight Sharding
#100 opened by winglian - 18
Quesiton on the speed for generating the response
#111 opened by NEWbie0709 - 9
- 13
- 1
zero and scale quant
#109 opened by kaizizzzzzz - 7
Warning: failed to import the BitBlas backend
#105 opened by jinz2014 - 14
- 1
Easy way to run lm evaluation harness
#104 opened by pythonLoader - 3
hqq+ lora ValueError || ValueError: Unable to create tensor, you should probably activate truncation and/or padding with 'padding=True' 'truncation=True'
#85 opened by tellyoung - 1
- 2
Support Gemma quantization
#101 opened by kaizizzzzzz - 1
RuntimeError: Expected in.dtype() == at::kInt to be true, but got false. (Could this error message be improved? If so, please report an enhancement request to PyTorch.)
#99 opened by kadirnar - 10
3-bit quantization weight data type issue
#97 opened by BeichenHuang - 1
About the implentation of .cpu()
#96 opened by reflectionie - 3
bitblas introduces dependency on CUDA version
#94 opened by zodiacg - 1
OSError: libnvrtc.so.12: cannot open shared object file: No such file or directory
#95 opened by kadirnar - 4
- 3
2-bit quantization representation
#90 opened by kaizizzzzzz - 4
1 bit inference
#88 opened by kaizizzzzzz - 0
- 1
Group_Size setting
#87 opened by kaizizzzzzz - 1
Is HQQLinearLoRAWithFakeQuant differentiable?
#84 opened by lippman1125 - 2
Question about quantization.
#83 opened by mxjmtxrm - 7
AttributeError: 'HQQLinearTorchWeightOnlynt4' object has no attribute 'weight'
#81 opened by ChuanhongLi - 17
prepare_for_inference error
#77 opened by BeichenHuang - 3
Running HQQ Quantized Models on CPU
#82 opened by 49Simon - 4
- 3
- 6
HQQ for convolutional layers
#78 opened by danishansari - 5
Not able to save quantized model
#75 opened by BeichenHuang - 2
No module named 'hqq.engine' Error.
#76 opened by yixuantt - 4
Can the quantization process be on CPU?
#74 opened by mxjmtxrm - 1
- 1
Compatibility Issue: TypeError for Union Type Hints with Python Versions Below 3.10
#72 opened by hjh0119