pytorch_quantization

With this repository, you can try model quantization of MobileNetV2 trained on CIFAR10 dataset. Currently, post training static quantization and quantization aware training are suppored.

model	quantization method	CIFAR10 val accuracy [%]	model size [MB]
MobileNetV2 (float)	-	96.36	14
MobileNetV2 (int8)	post training static quantization	95.53	3.8
MobileNetV2 (int8)	quantization aware training	96.30	3.8

Requirements

Ubuntu OS
CUDA (tested with 11.6)
Python3 (test with 3.8.8)

See requirements.txt for additional requirements.

May work with other versions, but note that torch>=1.3.0 is required to use PyTorch quantization library.

Setup

$ pip install -r requirements.txt

Before training, sign up for W&B and create a new project named pytorch_model_quantization.

Get your API key from W&B > Settings > API keys and then:

$ echo 'WANDB_API_KEY = "xxxx"' > .env  # replace xxxx with your own W&B API key

train.py will load the API key from .env to send training logs to W&B.

Pretrained weights

Pretrained weights are available:

unzip models_v2.zip

models/exp_2000/model_best.pth: float model
models/exp_2001/model_best.pth: model trained with qnantization aware training

Post training static quantization

You need to train float model first (can be skipped if you use pretrained weight):

$ EXP_ID=2000
$ python train.py $EXP_ID --mode normal --lr 0.005 --batch_size 64

Trained weight is saved into models/exp_2000/best_model.pth.

To evaluate this model:

$ python test.py $EXP_ID --mode normal

You can apply post training static quantization to this float model:

$ python test.py $EXP_ID --mode ptq --replace_relu --fuse_model

To compare the model size:

$ ls -lh models/exp_2000/scripted_*
...
-rw-r--r-- 1 kimura kimura  14M May 27 04:22 scripted_model_normal.pth  # floating
-rw-r--r-- 1 kimura kimura 3.8M May 27 04:37 scripted_model_ptq-relu-fused.pth  # quantized (post training static quantization)
...

Quantization aware training

For quantization aware training (can be skipped if you use pretrained weight):

$ EXP_ID=2001
$ python train.py $EXP_ID --mode qat --replace_relu --fuse_model --lr 0.005 --batch_size 64

Trained weight is saved into models/exp_2001/best_model.pth.

To evaluate this model:

$ python test.py $EXP_ID --mode qat  --replace_relu --fuse_model

To check the model size:

$ ls -lh models/exp_2001/scripted_*
...
-rw-r--r-- 1 kimura kimura 3.8M May 27 07:52 scripted_model_ptq-relu-fused.pth  # quantized (quantization aware training)
...

TODOs

Add a table to show model accuracy and performance
Add more options for QAT (observer, etc.)
Add models
Finish docstring

motokimura/pytorch_quantization