intel/neural-compressor
SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime
PythonApache-2.0
Issues
- 10
Failed to save quantized model
#2001 opened by lockeregg - 1
- 4
Numba package requried for int-4 quantization
#2031 opened by aneelaka-int - 0
Error while running Whisper model quantization with Intel neural compressor
#2056 opened by Shivani-k16 - 0
- 0
Coding error!!
#2027 opened by AheadSnail - 0
- 1
LLM smoothquant, how to add a customer evaluate func ?
#1999 opened by tianylijun - 0
smoothquant, any quant/dequant module can be found in exported quant pt model ?
#1996 opened by tianylijun - 7
how to evaluate AWQ ?
#1980 opened by chunniunai220ml - 7
NotImplementedError is raised in static INT8 Quantization with PT2E Backend default recipe
#1984 opened by haitamhawa - 8
Model Size Increase After PTQ
#1968 opened by zhangxu223 - 1
Quantization failed
#1972 opened by endomorphosis - 3
Why is LayerNorm not quantized to int8 in PTQ?
#1963 opened by zhangxu223 - 5
Dataset Selection for Post-Training Quantization (PTQ)
#1951 opened by zhangxu223 - 13
- 3
PTQ with IPEX backend and XPU device is not working
#1889 opened by paguilomanas - 4
Continue quantization from history.snapshot
#1778 opened by oyazdanb - 2
- 0
Error in fp8 quantization: Invalid scale factor : 1.70e+06, make sure the scale is not larger than : 6.55e+04
#1907 opened by yyChen233 - 0
FP4 encoding related
#1891 opened by Tiantian-Han - 8
how to extract int8 weights from quantized model
#1817 opened by chensterliu - 3
I tried to fine-tune and prune the Helsinki-opus-MT series model, but an error occurred.
#1820 opened by mc112611 - 0
Is there any accuracy data related to FP4?
#1835 opened by PhzCode - 5
How to load quantized LLM and do inference?
#1776 opened by 0400H - 2
Per tensor quantization in smoothquant
#1689 opened by chensterliu - 3
- 22
how to get smoothed model before do quantization?
#1626 opened by chunniunai220ml - 1
AWQ Quantization padding error
#1699 opened by PatriceVignola - 1
RTN sym behavior is not aligned
#1695 opened by wenhuach21 - 1
AWQ fails on ONNX model when a MatMul node's input is a model input/initializer
#1571 opened by jstoecker - 3
Quantized Neural compress model not generating expected results in AMD processor
#1531 opened by Bhuvaneswaran-R - 1
io.UnsupportedOperation: fileno
#1714 opened by jashokkumar83 - 11
Using Neural Compressor on cloud resources
#1678 opened by rmiller3 - 0
AutoRound sym is not aligned
#1696 opened by wenhuach21 - 1
PytorchBasicPruner bug?
#1679 opened by tchittesh - 1
Model execution is single threaded?
#1663 opened by akhauriyash - 1
neural_compressor/adaptor/ox_utils/quantizer.py dfs crash during "basic" tuning
#1621 opened by kmn1024 - 3
How to quantify google/vit-base-patch16-224 pytorch_model.bin to int8 type with neural-compressor
#1612 opened by yingmuying - 4
Time based TuningCriterion to keep best performing model?
#1617 opened by kmn1024 - 1
PostTrainingQuantConfig(quant_level='auto', device='npu', backend="onnxrt_dml_ep") produces fp32 ops.
#1580 opened by kleiti - 1
AWQ quantization is very slow for ONNX LLMs
#1609 opened by PatriceVignola - 1
How to perform int8 quantisation (not uint8) using ONNX?
#1610 opened by paul-ang - 1
Unable to save llama2 after SmoothQuant
#1600 opened by dellamuradario - 1
how to get layer_mappings for distillation?
#1590 opened by Michael-Fuu - 2
Quantization of MaxVit model
#1576 opened by mkompanek - 2
get_number_of_sockets should take locale into account
#1588 opened by sonald - 10
- 1
- 0
potential bug in calculating scale,zp in sq
#1533 opened by wenhuach21