/auto-round

SOTA Weight-only Quantization Algorithm for LLMs. This is official implementation of "Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs"

Primary LanguagePythonApache License 2.0Apache-2.0

Stargazers