intel/neural-compressor
SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime
PythonApache-2.0
Stargazers
- adelCIntel
- agnotta
- aklingam7My Laptop
- AlanChouSynopsys
- ashokei
- Atlast7
- bab2minKakao Corp.
- Baran-physOrigins Cluster | TUM
- bmyrchaGdansk, Poland
- chuanqi129Shanghai
- ClarkChin08Shanghai, China
- CloudFlyCNShanghai
- davidas1
- dkankowsIntel
- dmitryteIntel
- ftian1Intel
- gglin001
- guomingz@intel
- justusschock@Lightning-AI
- kalyangvsUniversity of Florida
- lcskrishnaAMD
- lichangqing2611
- Lu-Tan
- mengniwang95
- mfuntowicz@huggingface
- PenghuiChengintel
- peteriz@IntelLabs
- prodyut
- psakamooriIntel Corporation
- signorgelatoChicago
- sixclusterchina
- tariqafzal
- tybulewiczPoland
- vdevaram
- vfdev-5@Quansight
- wincent8