smoothquant

There are 2 repositories under smoothquant topic.

  • intel/neural-compressor

    SOTA low-bit LLM quantization (INT8/FP8/MXFP8/INT4/MXFP4/NVFP4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime

    Language:Python2.5k31220281
  • ModelTC/LightCompress

    A powerful toolkit for compressing large models including LLM, VLM, and video generation models.

    Language:Python61399862