/llmc

This is the official PyTorch implementation of "LLM-QBench: A Benchmark Towards the Best Practice for Post-training Quantization of Large Language Models", and also an efficient LLM compression tool with various advanced compression methods, supporting multiple inference backends.

Primary LanguagePythonApache License 2.0Apache-2.0

Watchers