huggingface/optimum-quanto

A pytorch quantization backend for optimum

PythonApache-2.0

Issues

Non-strict loading of the state dict
#278 opened 2 months ago by BenjaminBossan
10
TypeError: _to_copy() takes from 2 to 3 positional arguments but 4 were given
#289 opened a month ago by arseniybelkov
0
unsupported Microsoft Visual Studio version!
#288 opened a month ago by MMundane
1
RuntimeError: _convert_weight_to_int4pack_cpu : expect weight to be kByte
#274 opened a month ago by dvrogozh
2
This is not allowed since there's already a kernel registered from python overriding unpack's behavior for CPU dispatch key and quanto_ext namespace.
#285 opened 2 months ago by arseniybelkov
4
Moving qint4 models takes a large amount of time
#270 opened 2 months ago by gabe56f
4
RuntimeError: _int_mm_out_cuda not compiled for this platform.
#245 opened 3 months ago by mattiadg
5
Integrate marlin fp16/bf16-float8 matrix multiplication kernel
#238 opened a month ago by dacorvo
2
Integrate marlin fp16/bf16-int4/int8 matrix multiplication kernel
#239 opened a month ago by dacorvo
2
Use correct float8 quantization range in MaxOptimizer
#240 opened a month ago by dacorvo
2
Verify extension behaviour in google Colab
#206 opened 2 months ago by dacorvo
6
NF4/bitsandbytes support for flux.dev and flux.schnell
#277 opened 2 months ago by AmericanPresidentJimmyCarter
0
Update README exmples with import statements for clarity
#230 opened 2 months ago by sugatoray
2
Support for FP8 Matmuls
#275 opened 2 months ago by maktukmak
0
Switch to ruff native formatter
#186 opened 2 months ago by dacorvo
7
Support for new diffuser: flux1.schnell
#272 opened 2 months ago by KoppAlexander
3
`qint4` failing with PixArt Transformer
#228 opened 3 months ago by sayakpaul
7
Errors when applied to Lumina-Next
#269 opened 2 months ago by phil329
1
optimized kernel for quanto::dqmm not found
#203 opened 2 months ago by kechan
5
Accuracy took a big hit with activation=qint8 for an open clip model
#216 opened 2 months ago by kechan
3
Inference from a reload quantized open clip model (by .load_state_dict) resulted in IndexError
#217 opened 2 months ago by kechan
4
Incompatibility with `torch.compile()`
#221 opened 2 months ago by sanchit-gandhi
3
is textencoder Quantized Diffusers models can be saved and loaded?
#264 opened 2 months ago by lonngxiang
5
Investigate: pack densely scale+shift tensors into the weight tensors for highly quantize tensors
#266 opened 2 months ago by maruel
0
LLM forward pass doesn't work before freezing (matmul dtype mismatch)
#267 opened 2 months ago by carlguo866
0
Running with optimum-quanto why isn't there a huge reduction in GPU memory？
#265 opened 2 months ago by lonngxiang
0
VLLM Supported?
#220 opened 2 months ago by RanchiZhao
2
Error conv2d() received an invalid combination of arguments after quantize the model
#256 opened 2 months ago by KhaoKhao
2
Support serialization and deserialization of `diffusers` modules
#252 opened 2 months ago by sayakpaul
3
fp8 leads to black images (numerical instabilities) for transformer diffusion models
#231 opened 2 months ago by sayakpaul
5
Pixart sigma example crash on CUDA arch >= 80 with int4 weights
#248 opened 2 months ago by dacorvo
11
Packages created on the CI are missing cpp and cuda extension files
#254 opened 2 months ago by dacorvo
0
Why latency of quantized model is even more than unquantized model?
#232 opened 2 months ago by ZhangYuef
3
Why the quantized net is slower?
#184 opened 4 months ago by theguardsgod
4
Should we stop using quanto without the optimum?
#215 opened 3 months ago by kechan
1
Use `torch.ops.aten._weight_int4pack_mm` for W4A16 inference
#218 opened 3 months ago by dacorvo
1
Your tool seemed useless?
#233 opened 3 months ago by LumenScopeAI
1
why missing ?
#224 opened 3 months ago by xalteropsx
2
Force a recompilation of the extensions when upgrading pytorch
#194 opened 3 months ago by dacorvo
1
Add `torch.ops.aten._weight_int8pack_mm` for W8A16 inference
#219 opened 3 months ago by dacorvo
0
CUDA Kernel
#214 opened 3 months ago by satabios
1
RuntimeError: derivative for dequantize is not implemented
#212 opened 3 months ago by Eugene29
2
KeyError: 'lm_head.weight_qtype' when loading the quanto model
#213 opened 3 months ago by RanchiZhao
5
Unable quantize a single linear layer: throws error: ValueError: Cannot quantize Tensor of shape torch.Size([1, 10]) along axis 0 of size 1
#192 opened 4 months ago by rajat-008
5
[Feature Request] FP6 🤗
#189 opened 4 months ago by NicolasMejiaPetit
2
quanto_cuda.so: cannot open shared object file: No such file or directory
#207 opened 4 months ago by nuclear-missile
0
Quantized CLIPModel inference not noticeably faster (or even slower) than non quantized
#202 opened 4 months ago by kechan
8
Got stuck when train resnet50 with QAT
#183 opened 4 months ago by catsled
7
ValueError: The model is quantized with QuantizationMethod.QUANTO and is not serializable - check out the warnings from the logger on the traceback to understand the reason why the quantized model is not serializable.
#188 opened 5 months ago by gospacedev
3
[Feature Request] INT16 🤗
#190 opened 5 months ago by duanshengliu
2