intel/neural-compressor

SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime

PythonApache-2.0

Readme
206Issues
2.2kStargazers
33Watchers

Stargazers

Prev
Next

Contact site admin: Geeks.