/QLLM

[ICLR 2024] This is the official PyTorch implementation of "QLLM: Accurate and Efficient Low-Bitwidth Quantization for Large Language Models"

Primary LanguagePythonApache License 2.0Apache-2.0

QLLM: Accurate and Efficient Low-Bitwidth Quantization for Large Language Models (ICLR 2024)

License arXiv

This is the official PyTorch implementation of QLLM: Accurate and Efficient Low-Bitwidth Quantization for Large Language Models.

By Jing Liu, Ruihao Gong, Xiuying Wei, Zhiwei Dong, Jianfei Cai, and Bohan Zhuang.

qllm

We propose QLLM, an accurate and efficient low-bitwidth post-training quantization method designed for LLMs.

๐Ÿ“ฐ News

  • [10-03-2024] Release the code!๐ŸŒŸ
  • [17-01-2024] QLLM is accepted by ICLR 2024! ๐Ÿ‘

๐Ÿ“– Contents

๐Ÿ›  Install

conda create -n qllm python=3.10 -y
conda activate qllm
git clone https://github.com/ModelTC/QLLM
cd QLLM
pip install --upgrade pip 
pip install -e .

โš™๏ธ Usage

We provide the training scripts in scripts folder. For example, to perform W4A8 quantization for LLaMA-7B, run

sh scripts/llama-7b/w4a4.sh

Remember to change the path of model model and output path output_dir.

๐Ÿ“‹ Results

  • QLLM achieve SoTA performance in weight-activation quantization

weight_activation_llama_1 weight_activation_llama_2

๐Ÿ“ Citation

If you find our QLLM useful in your research, please consider to cite the following related papers:

@inproceedings{liu2024qllm,
  title = {{QLLM}: Accurate and Efficient Low-Bitwidth Quantization for Large Language Models},
  author = {Liu, Jing and Gong, Ruihao and Wei, Xiuying and Dong, Zhiwei and Cai, Jianfei and Zhuang, Bohan},
  booktitle = {International Conference on Learning Representations (ICLR)},
  year = {2024},
}

๐Ÿงพ License

This repository is released under the Apache 2.0 license as found in the LICENSE file.

๐Ÿ™ Acknowledgement

This repository is built upon OmniQuant. We thank the authors for their open-sourced code.