/EasyQuant

EasyQuant(EQ) is an efficient and simple post-training quantization method via effectively optimizing the scales of weights and activations.

Primary LanguagePythonOtherNOASSERTION

EasyQuant: Post-training Quantization via Scale Optimization

EasyQuant(EQ) is an efficient and simple post-training quantization method via effectively optimizing the scales of weights and activations. Our paper is available on arXiv

Requirements

eq-ncnn

caffe

pip install -r requirements.txt

Updates:

  • 06/25/2020: We have released EasyQuant.pdf paper and eq-ncnn.
  • 06/24/2020: We have released VGG16 example.

Data Preparation

First, for ImageNet1k classification task, please download ImageNet2012. We random sampled 3000 calibration images from ImageNet val set to data/calib.txt for KLD quantization and select 50 samples from calib to data/list50.txt for EasyQuant scale finetuning. Then we will evaluate quantified models on val set.

How to Run

  1. Get caffe and ncnn ready
  2. Build python_ncnn
# 
cd python_ncnn
# modify ncnn build path in Makefile
make -j8
  1. Run VGG16 example
cd ..
sh example/vgg16/run.sh

This following 6 steps will be performed in run.sh.

# download vgg16 model and upgrade proto and caffemodel files
sh model/vgg16/net_file_upgrade.sh

# generate scale of weight and activation use quantation tools
sh example/vgg16/run_scale_quantation.sh

# get ncnn param bin from scale table
sh example/vgg16/run_caffe2ncnn.sh

# infer layer blob shape
sh example/vgg16/run_infer_shape.sh

# run scale fine tuning
sh example/vgg16/run_scale_fine_tuning.sh

# run validtion on imagenet val
sh example/vgg16/run_validation.sh

Results

Validation Results

1. Classification on ImageNet2012 validation dataset for different convolutional models in context of both INT8 and INT7 post-training quantization.

Models FP32 INT8-TRT INT8-EQ INT7-TRT INT7-EQ
SqueezeNetV1.1 56.56 56.24 56.28 54.88 56.08
MobileNetV1 69.33 68.74 68.84 66.97 68.26
VGG16 70.97 70.95 70.97 70.92 70.96
ResNet50 75.20 75.04 75.13 72.78 75.04

2. Object detection on VOC2007 task for SSD models with bachbone SqueezeNet and MobileNet V1.

Models FP32 INT8-TRT INT8-EQ INT7-TRT INT7-EQ
SqueezeNet-SSD 62.00 61.45 62.05 60.01 61.62
MobileNet-SSD 72.04 69.79 71.39 63.88 68.79

3. Verification performance for InsightFace model MobileFaceNet on 7 most common validation dataset

Models FP32 INT8-TRT INT8-EQ INT7-TRT INT7-EQ
lfw 99.45 99.36 99.48 99.28 99.36
agedb_30 95.78 95.23 95.38 95.03 95.73
calfw 95.05 94.76 94.88 94.75 94.68
cfp_ff 99.50 99.50 99.61 99.44 99.60
cfp_fp 89.77 89.17 90.04 88.47 89.87
cplfw 86.45 85.58 86.03 85.91 86.76
vgg2_fp 90.64 89.70 90.50 89.64 90.44

4. Compare our method with more complex QAT(quantization aware trainin) approach in 8 bit width.

Methods MobileNetV1-FP32 MobileNetV1-INT8 ResNet50-FP32 ResNet50-INT8
EQ 69.33 68.84 75.20 75.13
QAT 70.90 70.70 75.20 75.00

Speed Tests

INT7 Post-training Inference VS INT8 on real devices. We implement our efficient designs on INT7 post-training inference which well be released in eq-ncnn

1. The latency (ms) performance on RK3399, whose inside is a 1.5 GHz 64-bit Quad-core ARM Cortex-A53. #k means k threads.

Models TRT-INT8(#1) EQ-INT7(#1) TRT-INT8(#4) EQ-INT7(#4)
SqueezeNetV1.1 180 120 66 44
MobileNetV1 234 189 65 57
VGG16 3326 2873 1423 1252
ResNet50 1264 993 415 300

2. The latency (ms) performance on RK3399, which inside is a 1.8 GHz 64-bit Dual-core ARM Cortex-A72. #k means k threads.

Models TRT-INT8(#1) EQ-INT7(#1) TRT-INT8(#2) EQ-INT7(#2)
SqueezeNetV1.1 79 57 54 37
MobileNetV1 105 84 56 46
VGG16 1659 1385 1034 849
ResNet50 559 463 338 262

Contributing

PRs accepted.

License and Citation

BSD3 © DeepGlint

@inproceedings{easyquant,
    title={EasyQuant: Post-training Quantization via Scale Optimization},
    author={Di Wu, Qi Tang, Yongle Zhao, Ming Zhang, Debing Zhang, Ying Fu},
    year={2020}
}