intel/neural-compressor

SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime

PythonApache-2.0

Issues

Failed to save quantized model
#2001 opened a month ago by lockeregg
10
[SmoothQuant] Folding the Mul op raise mismatch tensor's size error
#2050 opened 2 months ago by duanshengliu
1
Numba package requried for int-4 quantization
#2031 opened 2 months ago by aneelaka-int
4
Error while running Whisper model quantization with Intel neural compressor
#2056 opened 2 months ago by Shivani-k16
0
AssertionError of act_observer when using SmoothQuant for Llama-13b
#2033 opened 3 months ago by kyang-06
0
Coding error！！
#2027 opened 3 months ago by AheadSnail
0
Qwen/Qwen2.5-7B-Instruct model layer_wise_quant function error
#2017 opened 3 months ago by hadoop2xu
0
LLM smoothquant, how to add a customer evaluate func ?
#1999 opened 4 months ago by tianylijun
1
smoothquant， any quant/dequant module can be found in exported quant pt model ?
#1996 opened 4 months ago by tianylijun
0
how to evaluate AWQ ?
#1980 opened 5 months ago by chunniunai220ml
7
NotImplementedError is raised in static INT8 Quantization with PT2E Backend default recipe
#1984 opened 4 months ago by haitamhawa
7
Model Size Increase After PTQ
#1968 opened 5 months ago by zhangxu223
8
Quantization failed
#1972 opened 5 months ago by endomorphosis
1
Why is LayerNorm not quantized to int8 in PTQ?
#1963 opened 5 months ago by zhangxu223
3
Dataset Selection for Post-Training Quantization (PTQ)
#1951 opened 5 months ago by zhangxu223
5
Segmentation fault (core dumped) when compressing the model.
#1538 opened a year ago by SunHaoOne
13
PTQ with IPEX backend and XPU device is not working
#1889 opened 6 months ago by paguilomanas
3
Continue quantization from history.snapshot
#1778 opened 6 months ago by oyazdanb
4
Any example to quantise a text embedding model on Intel Gaudi2?
#1919 opened 6 months ago by sleepingcat4
2
Error in fp8 quantization: Invalid scale factor : 1.70e+06, make sure the scale is not larger than : 6.55e+04
#1907 opened 6 months ago by yyChen233
0
FP4 encoding related
#1891 opened 6 months ago by Tiantian-Han
0
how to extract int8 weights from quantized model
#1817 opened 8 months ago by chensterliu
8
I tried to fine-tune and prune the Helsinki-opus-MT series model, but an error occurred.
#1820 opened 7 months ago by mc112611
3
Is there any accuracy data related to FP4?
#1835 opened 7 months ago by PhzCode
0
How to load quantized LLM and do inference?
#1776 opened 8 months ago by 0400H
5
Per tensor quantization in smoothquant
#1689 opened 8 months ago by chensterliu
2
'q_config' is needed when export an INT8 model
#1736 opened 9 months ago by ZhangShuoAlreadyExists
3
how to get smoothed model before do quantization?
#1626 opened 9 months ago by chunniunai220ml
22
AWQ Quantization padding error
#1699 opened 9 months ago by PatriceVignola
1
RTN sym behavior is not aligned
#1695 opened 9 months ago by wenhuach21
1
AWQ fails on ONNX model when a MatMul node's input is a model input/initializer
#1571 opened a year ago by jstoecker
1
Quantized Neural compress model not generating expected results in AMD processor
#1531 opened a year ago by Bhuvaneswaran-R
3
io.UnsupportedOperation: fileno
#1714 opened 9 months ago by jashokkumar83
1
Using Neural Compressor on cloud resources
#1678 opened 9 months ago by rmiller3
11
AutoRound sym is not aligned
#1696 opened 10 months ago by wenhuach21
0
PytorchBasicPruner bug?
#1679 opened 10 months ago by tchittesh
1
Model execution is single threaded?
#1663 opened 10 months ago by akhauriyash
1
neural_compressor/adaptor/ox_utils/quantizer.py dfs crash during "basic" tuning
#1621 opened a year ago by kmn1024
1
How to quantify google/vit-base-patch16-224 pytorch_model.bin to int8 type with neural-compressor
#1612 opened a year ago by yingmuying
3
Time based TuningCriterion to keep best performing model?
#1617 opened a year ago by kmn1024
4
PostTrainingQuantConfig(quant_level='auto', device='npu', backend="onnxrt_dml_ep") produces fp32 ops.
#1580 opened a year ago by kleiti
1
AWQ quantization is very slow for ONNX LLMs
#1609 opened a year ago by PatriceVignola
1
How to perform int8 quantisation (not uint8) using ONNX?
#1610 opened a year ago by paul-ang
1
Unable to save llama2 after SmoothQuant
#1600 opened a year ago by dellamuradario
1
how to get layer_mappings for distillation?
#1590 opened a year ago by Michael-Fuu
1
Quantization of MaxVit model
#1576 opened a year ago by mkompanek
2
get_number_of_sockets should take locale into account
#1588 opened a year ago by sonald
2
AssertionError: Framework is not detected correctly from model format.
#1534 opened a year ago by AnindyaSD
10
tensorflow is EOL, will there be a pytorch compatible version?
#1535 opened a year ago by thistleknot
1
potential bug in calculating scale,zp in sq
#1533 opened a year ago by wenhuach21
0