/I-ViT

[ICCV 2023] I-ViT: Integer-only Quantization for Efficient Vision Transformer Inference

Primary LanguagePythonApache License 2.0Apache-2.0

I-ViT: Integer-only Quantization for Efficient Vision Transformer Inference

This repository contains the official implementation for the paper "I-ViT: Integer-only Quantization for Efficient Vision Transformer Inference". To the best of our knowledge, this is the first work on integer-only quantization for vision transformers.

Below are instructions of Pytorch code to reproduce the accuracy results of quantization-aware training (QAT). TVM benchmark is the TVM deployment project for reproducing latency results.

Installation

  • To install I-ViT and develop locally:
git clone https://github.com/zkkli/I-ViT.git
cd I-ViT

QAT Experiments

  • You can quantize and fine-tune a single model using the following command:
python quant_train.py [--model] [--data] [--epochs] [--lr]

optional arguments:
--model: Model architecture, the choises can be: 
         deit_tiny, deit_small, deit_base, swin_tiny, swin_small, swin_base.
--data: Path to ImageNet dataset.
--epochs: recommended values are: [30, 60, 90], default=90.
--lr: recommended values are: [2e-7, 5e-7, 1e-6, 2e-6], default=1e-6.
  • Example: Quantize and fine-tune DeiT-T:
python quant_train.py --model deit_tiny --data <YOUR_DATA_DIR> --epochs 30 --lr 5e-7 

Results

Below are the Top-1 (%) accuracy results of our proposed I-ViT that you should get on ImageNet dataset.

Model FP32 INT8 (I-ViT) Diff.
ViT-S 81.39 81.27 -0.12
ViT-B 84.53 84.76 +0.23
DeiT-T 72.21 72.24 +0.03
DeiT-S 79.85 80.12 +0.27
DeiT-B 81.85 81.74 -0.11
Swin-T 81.35 81.50 +0.15
Swin-S 83.20 83.01 -0.19

Citation

We appreciate it if you would please cite the following paper if you found the implementation useful for your work:

@article{li2022ivit,
  title={I-ViT: integer-only quantization for efficient vision transformer inference},
  author={Li, Zhikai and Gu, Qingyi},
  journal={arXiv preprint arXiv:2207.01405},
  year={2022}
}