Issues
- 9
Add CUDA kernels for Wint4Afloat16
#111 opened by dacorvo - 3
Switch to ruff native formatter
#186 opened by dacorvo - 2
Add examples based on ViT
#169 opened by dacorvo - 1
[Feature Request] FP6 🤗
#189 opened by NicolasMejiaPetit - 2
Unable quantize a single linear layer: throws error: ValueError: Cannot quantize Tensor of shape torch.Size([1, 10]) along axis 0 of size 1
#192 opened by rajat-008 - 8
- 6
switch setup.py to pyproject.toml
#123 opened by baggiponte - 7
Got stuck when train resnet50 with QAT
#183 opened by catsled - 4
Can I use quanto on AMD GPU?
#182 opened by catsled - 3
- 2
Does quanto work with flashattention?
#127 opened by jzhang38 - 3
Question: If we have a model wrapped in another class, will that will work with the Calibration mode?
#141 opened by aryanmagoon - 12
Write a helper to reload a quantized state_dict
#162 opened by dacorvo - 4
Safetensor serialization throws "conv_in.weight.qtype is invalid expected torch.Tensor but received string"
#165 opened by lsb - 2
Feature Request/Int4 Cuda Kernels
#142 opened by NicolasMejiaPetit - 3
- 3
Question: any plan to formally support smooth quantization and make it more general
#161 opened by yiliu30 - 1
1.58 bit quantization
#176 opened by leo-gan - 6
on cpu, i can not use quanto very well , RuntimeError: Input type (float) and bias type (c10::Half) should be the same
#172 opened by carsonche - 17
Saving and loading quantized models doesn't work?
#136 opened by tanishqkumar - 1
Why the quantized net is slower?
#184 opened by theguardsgod - 0
- 3
ValueError: The model is quantized with QuantizationMethod.QUANTO and is not serializable - check out the warnings from the logger on the traceback to understand the reason why the quantized model is not serializable.
#188 opened by gospacedev - 2
[Feature Request] INT16 🤗
#190 opened by duanshengliu - 6
Question about the gradient of QTensor and QBitTensor
#146 opened by shuokay - 2
- 0
Add a StableDiffusion example
#170 opened by dacorvo - 5
- 0
on cpu, i can not use quanto very well , RuntimeError: Input type (float) and bias type (c10::Half) should be the same
#173 opened by carsonche - 1
pull request template
#166 opened by ManoBharathi93 - 1
Usage on TPU
#149 opened by Locutusque - 6
How does quanto calibrate torch functions?
#152 opened by shuokay - 2
How dose quanto support int8 conv2d and linear?
#158 opened by zhexinli - 4
- 1
Is there any plan to add the function to export ONNX for quantized models or to inference on TVM compiler?
#125 opened by ntkhoa95 - 1
- 1
Dequantizing tensors using quanto
#139 opened by raunaks13 - 3
quantize() returns None with VGG19
#157 opened by jonahclarsen - 1
Why does QTensor need both `__torch_function__` and `__torch_dispatch__` by design?
#156 opened by shuokay - 1
Add hqq Optimizer
#133 opened by dacorvo - 3
Add MSE Optimizer
#135 opened by shuokay - 5
Inference Speed of Quantized Model
#151 opened by mdatres - 4
QLinear quantised scale tensor
#137 opened by mdatres - 6
Add Percentile Optimizer
#143 opened by shuokay - 3
- 0
QModuleMixin: Make activation support optimizer
#144 opened by shuokay - 5
- 4
Introduce optimizers
#131 opened by dacorvo - 5
Incompatible with safetensors serialization
#100 opened by SunMarc - 2
Batch matrix multiplication test fails for MPS device on float16 precision
#105 opened by alejandroarmas