huggingface/quanto

A pytorch Quantization Toolkit

PythonApache-2.0

Issues

Add CUDA kernels for Wint4Afloat16
#111 opened 3 days ago by dacorvo
9
Switch to ruff native formatter
#186 opened a month ago by dacorvo
3
Add examples based on ViT
#169 opened 2 months ago by dacorvo
2
[Feature Request] FP6 🤗
#189 opened a month ago by NicolasMejiaPetit
1
Unable quantize a single linear layer: throws error: ValueError: Cannot quantize Tensor of shape torch.Size([1, 10]) along axis 0 of size 1
#192 opened a month ago by rajat-008
2
VGG19 not using any less VRAM in qint8/qfloat8 than in fp16
#159 opened 4 days ago by jonahclarsen
8
switch setup.py to pyproject.toml
#123 opened 6 days ago by baggiponte
6
Got stuck when train resnet50 with QAT
#183 opened 7 days ago by catsled
7
Can I use quanto on AMD GPU?
#182 opened 8 days ago by catsled
4
Potential readme issue - falls back to original dtype, not fp32
#180 opened 10 days ago by calmitchell617
3
Does quanto work with flashattention?
#127 opened 11 days ago by jzhang38
2
Question: If we have a model wrapped in another class, will that will work with the Calibration mode?
#141 opened 11 days ago by aryanmagoon
3
Write a helper to reload a quantized state_dict
#162 opened 14 days ago by dacorvo
12
Safetensor serialization throws "conv_in.weight.qtype is invalid expected torch.Tensor but received string"
#165 opened 14 days ago by lsb
4
Feature Request/Int4 Cuda Kernels
#142 opened 15 days ago by NicolasMejiaPetit
2
Quanto scale values seem unpopulated in quantized model
#155 opened 15 days ago by raunaks13
3
Question: any plan to formally support smooth quantization and make it more general
#161 opened 15 days ago by yiliu30
3
1.58 bit quantization
#176 opened 16 days ago by leo-gan
1
on cpu, i can not use quanto very well , RuntimeError: Input type (float) and bias type (c10::Half) should be the same
#172 opened 17 days ago by carsonche
6
Saving and loading quantized models doesn't work?
#136 opened 2 months ago by tanishqkumar
17
Why the quantized net is slower?
#184 opened a month ago by theguardsgod
1
Force a recompilation of the extensions when upgrading pytorch
#194 opened a month ago by dacorvo
0
ValueError: The model is quantized with QuantizationMethod.QUANTO and is not serializable - check out the warnings from the logger on the traceback to understand the reason why the quantized model is not serializable.
#188 opened a month ago by gospacedev
3
[Feature Request] INT16 🤗
#190 opened a month ago by duanshengliu
2
Question about the gradient of QTensor and QBitTensor
#146 opened 2 months ago by shuokay
6
Investigate the benefits of fused mixed-precision matrix multiplications
#96 opened a month ago by dacorvo
2
Add a StableDiffusion example
#170 opened 2 months ago by dacorvo
0
AttributeError: 'str' object has no attribute 'detach'
#175 opened 2 months ago by Gooddz1
5
on cpu, i can not use quanto very well , RuntimeError: Input type (float) and bias type (c10::Half) should be the same
#173 opened 2 months ago by carsonche
0
pull request template
#166 opened 2 months ago by ManoBharathi93
1
Usage on TPU
#149 opened 2 months ago by Locutusque
1
How does quanto calibrate torch functions?
#152 opened 2 months ago by shuokay
6
How dose quanto support int8 conv2d and linear?
#158 opened 2 months ago by zhexinli
2
Should quanto use int dtype in AffineQuantizer instead of uint?
#134 opened 2 months ago by shuokay
4
Is there any plan to add the function to export ONNX for quantized models or to inference on TVM compiler?
#125 opened 2 months ago by ntkhoa95
1
Performance of quanto quants vs bnb, AWQ, GPTQ, GGML ?
#129 opened 2 months ago by nnethercott
1
Dequantizing tensors using quanto
#139 opened 2 months ago by raunaks13
1
quantize() returns None with VGG19
#157 opened 2 months ago by jonahclarsen
3
Why does QTensor need both `__torch_function__` and `__torch_dispatch__` by design?
#156 opened 2 months ago by shuokay
1
Add hqq Optimizer
#133 opened 2 months ago by dacorvo
1
Add MSE Optimizer
#135 opened 2 months ago by shuokay
3
Inference Speed of Quantized Model
#151 opened 2 months ago by mdatres
5
QLinear quantised scale tensor
#137 opened 2 months ago by mdatres
4
Add Percentile Optimizer
#143 opened 2 months ago by shuokay
6
Error when quantizing network and running calibration.
#150 opened 2 months ago by lambertsbennett
3
QModuleMixin: Make activation support optimizer
#144 opened 2 months ago by shuokay
0
Is there a plan to add more flexible methods for setting quantization configs
#124 opened 2 months ago by shuokay
5
Introduce optimizers
#131 opened 2 months ago by dacorvo
4
Incompatible with safetensors serialization
#100 opened 3 months ago by SunMarc
5
Batch matrix multiplication test fails for MPS device on float16 precision
#105 opened 3 months ago by alejandroarmas
2