/GPTQModel

Production ready LLM model compression/quantization toolkit with accelerated inference support for both cpu/gpu via HF, vLLM, and SGLang.

Primary LanguagePythonApache License 2.0Apache-2.0

This repository is not active