This is a repository for QAT finetune on yolov5 using TensorRT's pytorch_quantization tool
Suggest to use docker environment.
Download docker imageοΌ
docker pull longxiaowyh/yolov5:v2.0
Create docker containerοΌ
nvidia-docker run -itu root:root --name yolov5 --gpus all -v /your_path:/target_path -v /tmp/.X11-unix/:/tmp/.X11-unix/ -e DISPLAY=unix$DISPLAY -e GDK_SCALE -e GDK_DPI_SCALE -e NVIDIA_VISIBLE_DEVICES=all -e NVIDIA_DRIVER_CAPABILITIES=compute,utility --shm-size=64g yolov5:v2.0 /bin/bash
1.Clone and apply patch
git clone git@github.com:yhwang-hub/yolov7_quantization.git
2.Install dependencies
pip install pytorch-quantization --extra-index-url https://pypi.ngc.nvidia.com
3.Prepare coco dataset
.
βββ annotations
β βββ captions_train2017.json
β βββ captions_val2017.json
β βββ instances_train2017.json
β βββ instances_val2017.json
β βββ person_keypoints_train2017.json
β βββ person_keypoints_val2017.json
βββ coco -> coco
βββ coco128
β βββ images
β βββ labels
β βββ LICENSE
β βββ README.txt
βββ images
β βββ train2017
β βββ val2017
βββ labels
β βββ train2017
β βββ train2017.cache
β βββ val2017
βββ train2017.cache
βββ train2017.txt
βββ val2017.cache
βββ val2017.txt
python ptq.py --weights ./weights/yolov5s.pt --cocodir /home/wyh/disk/coco/ --batch_size 5 --save_ptq True --eval_origin --eval_ptq --sensitive True
Modify the ignore_layers parameter in ptq.py as follows
parser.add_argument("--ignore_layers", type=str, default="model\.24\.m\.(.*)", help="regx")
python ptq.py --weights ./weights/yolov5s.pt --cocodir /home/wyh/disk/coco/ --batch_size 5 --save_ptq True --eval_origin --eval_ptq --sensitive False
python qat.py --weights ./weights/yolov5s.pt --cocodir /home/wyh/disk/coco/ --batch_size 5 --save_ptq True --save_qat True --eval_origin --eval_ptq --eval_qat
This script includes steps below:
-
Insert Q&DQ nodes to get fake-quant pytorch model Pytorch quntization tool provides automatic insertion of QDQ function. But for yolov7 model, it can not get the same performance as PTQ, because in Explicit mode(QAT mode), TensorRT will henceforth refer Q/DQ nodes' placement to restrict the precision of the model. Some of the automatic added Q&DQ nodes can not be fused with other layers which will cause some extra useless precision convertion. In our script, We find Some rules and restrictions for yolov7, QDQ nodes are automatically analyzed and configured in a rule-based manner, ensuring that they are optimal under TensorRT. Ensuring that all nodes are running INT8(confirmed with tool:trt-engine-explorer, see scripts/draw-engine.py). for details of this part, please refer quantization/rules.py, About the guidance of Q&DQ insert, please refer Guidance_of_QAT_performance_optimization
-
PTQ calibration After inserting Q&DQ nodes, we recommend to run PTQ-Calibration first. Per experiments, Histogram(MSE) is the best PTQ calibration method for yolov7. Note: if you are satisfied with PTQ result, you could also skip QAT.
-
QAT training After QAT, need to finetune traning our model. after getting the accuracy we are satisfied, Saving the weights to files