Our paper is accepted by IEEE Transactions on Cybernetics (TCYB).
- Distance-IoU Loss: Faster and Better Learning for Bounding Box Regression
- Enhancing Geometric Factors into Model Learning and Inference for Object Detection and Instance Segmentation
@Inproceedings{zheng2020diou,
author = {Zheng, Zhaohui and Wang, Ping and Liu, Wei and Li, Jinze and Ye, Rongguang and Ren, Dongwei},
title = {Distance-IoU Loss: Faster and Better Learning for Bounding Box Regression},
booktitle = {The AAAI Conference on Artificial Intelligence (AAAI)},
year = {2020},
}
@Article{zheng2021ciou,
author = {Zheng, Zhaohui and Wang, Ping and Ren, Dongwei and Liu, Wei and Ye, Rongguang and Hu, Qinghua and Zuo, Wangmeng},
title = {Enhancing Geometric Factors in Model Learning and Inference for Object Detection and Instance Segmentation},
booktitle = {IEEE Transactions on Cybernetics},
year = {2021},
}
In this paper, we propose Complete-IoU (CIoU) loss and Cluster-NMS for enhancing geometric factors in both bounding box regression and Non-Maximum Suppression (NMS), leading to notable gains of average precision (AP) and average recall (AR), without the sacrifice of inference efficiency. In particular, we consider three geometric factors, i.e., overlap area, normalized central point distance and aspect ratio, which are crucial for measuring bounding box regression in object detection and instance segmentation. The three geometric factors are then incorporated into CIoU loss for better distinguishing difficult regression cases. The training of deep models using CIoU loss results in consistent AP and AR improvements in comparison to widely adopted Ln-norm loss and IoU-based loss. Furthermore, we propose Cluster-NMS, where NMS during inference is done by implicitly clustering detected boxes and usually requires less iterations. Cluster-NMS is very efficient due to its pure GPU implementation, and geometric factors can be incorporated to improve both AP and AR. In the experiments, CIoU loss and Cluster-NMS have been applied to state-of-the-art instance segmentation (e.g., YOLACT), and object detection (e.g., YOLO v3, SSD and Faster R-CNN) models.
This repo only focuses on NMS improvement based on https://github.com/ultralytics/yolov3.
See non_max_suppression
function of utils/utils.py for our Cluster-NMS implementation.
This directory contains PyTorch YOLOv3 software developed by Ultralytics LLC, and is freely available for redistribution under the GPL-3.0 license. For more information please visit https://www.ultralytics.com.
The https://github.com/ultralytics/yolov3 repo contains inference and training code for YOLOv3 in PyTorch. The code works on Linux, MacOS and Windows. Training is done on the COCO dataset by default: https://cocodataset.org/#home. Credit to Joseph Redmon for YOLO: https://pjreddie.com/darknet/yolo/.
Python 3.7 or later with all pip install -U -r requirements.txt
packages including torch >= 1.5
. Docker images come with all dependencies preinstalled. Docker requirements are:
- Nvidia Driver >= 440.44
- Docker Engine - CE >= 19.03
Size | COCO mAP @0.5...0.95 |
COCO mAP @0.5 |
|
---|---|---|---|
YOLOv3-tiny YOLOv3 YOLOv3-SPP YOLOv3-SPP-ultralytics |
320 | 14.0 28.7 30.5 37.7 |
29.1 51.8 52.3 56.8 |
YOLOv3-tiny YOLOv3 YOLOv3-SPP YOLOv3-SPP-ultralytics |
416 | 16.0 31.2 33.9 41.2 |
33.0 55.4 56.9 60.6 |
YOLOv3-tiny YOLOv3 YOLOv3-SPP YOLOv3-SPP-ultralytics |
512 | 16.6 32.7 35.6 42.6 |
34.9 57.7 59.5 62.4 |
YOLOv3-tiny YOLOv3 YOLOv3-SPP YOLOv3-SPP-ultralytics |
608 | 16.6 33.1 37.0 43.1 |
35.4 58.2 60.7 62.8 |
- mAP@0.5 run at
--iou-thr 0.5
, mAP@0.5...0.95 run at--iou-thr 0.7
- Darknet results: https://arxiv.org/abs/1804.02767
- 2 GTX 1080 Ti
- Intel(R) Core(TM) i7-6850K CPU @ 3.60GHz
Evaluation command: python3 test.py --cfg yolov3-spp.cfg --weights yolov3-spp-ultralytics.pt
AP reports on coco 2014 minival
.
Image Size | Model | NMS | FPS | box AP | box AP75 | box AR100 |
---|---|---|---|---|---|---|
608 | YOLOv3-SPP-ultralytics | Fast NMS | 85.5 | 42.2 | 45.1 | 60.1 |
608 | YOLOv3-SPP-ultralytics | Original NMS | 14.6 | 42.6 | 45.8 | 62.5 |
608 | YOLOv3-SPP-ultralytics | DIoU-NMS | 7.9 | 42.7 | 46.2 | 63.4 |
608 | YOLOv3-SPP-ultralytics | Original NMS Torchvision | 95.2 | 42.6 | 45.8 | 62.5 |
608 | YOLOv3-SPP-ultralytics | Cluster-NMS | 82.6 | 42.6 | 45.8 | 62.5 |
608 | YOLOv3-SPP-ultralytics | Cluster-DIoU-NMS | 76.9 | 42.7 | 46.2 | 63.4 |
608 | YOLOv3-SPP-ultralytics | Weighted-NMS | 11.2 | 42.9 | 46.4 | 62.7 |
608 | YOLOv3-SPP-ultralytics | Weighted Cluster-NMS | 68.0 | 42.9 | 46.4 | 62.7 |
608 | YOLOv3-SPP-ultralytics | Weighted + Cluster-DIoU-NMS | 64.9 | 43.1 | 46.8 | 63.7 |
608 | YOLOv3-SPP-ultralytics | Merge + Torchvision NMS | 88.5 | 42.8 | 46.3 | 63.0 |
608 | YOLOv3-SPP-ultralytics | Merge + DIoU + Torchvision NMS | 82.5 | 43.0 | 46.6 | 63.2 |
-
Merge NMS is a simplified version of Weighted-NMS. It just use score vector for weighted coordinates, not combine score and IoU. (Refer to CAD for the details of Weighted-NMS.)
-
We further incorporate DIoU into NMS for YOLOv3 which can get higher AP and AR.
-
Note that Torchvision NMS has the fastest speed, that is owing to CUDA implementation and engineering accelerations (like upper triangular IoU matrix only). However, our Cluster-NMS requires less iterations for NMS and can also be further accelerated by adopting engineering tricks. Almost completed at the same time as the work of our paper is Glenn Jocher's Torchvision NMS + Merge. First, we do Torchvision NMS, then convert the output to vector to multiply the IoU matrix. Also, for Merge NMS, the IoU matrix is no need to be square shape
n*n
. It can bem*n
to save more time, wherem
is the boxes that NMS outputs. -
Currently, Torchvision NMS use IoU as criterion, not DIoU. However, if we directly replace IoU with DIoU in Original NMS, it will costs much more time due to the sequence operation. Now, Cluster-DIoU-NMS will significantly speed up DIoU-NMS and obtain exactly the same result.
-
Torchvision NMS is a function in Torchvision>=0.3, and our Cluster-NMS can be applied to any projects that use low version of Torchvision and other deep learning frameworks as long as it can do matrix operations. No other import, no need to compile, less iteration, fully GPU-accelerated and better performance.