smoothquant
There are 2 repositories under smoothquant topic.
intel/neural-compressor
SOTA low-bit LLM quantization (INT8/FP8/MXFP8/INT4/MXFP4/NVFP4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime
ModelTC/LightCompress
A powerful toolkit for compressing large models including LLM, VLM, and video generation models.