/ultralytics-YOLOv3-Cluster-NMS

Distance-IoU Loss: Faster and Better Learning for Bounding Box Regression (AAAI 2020)

Primary LanguageJupyter NotebookGNU General Public License v3.0GPL-3.0

Ultralytics-YOLOv3-Cluster-NMS

Cluster-NMS into YOLOv3 Pytorch

Our paper is accepted by IEEE Transactions on Cybernetics (TCYB).

This is the code for our paper:

@Inproceedings{zheng2020diou,
  author    = {Zheng, Zhaohui and Wang, Ping and Liu, Wei and Li, Jinze and Ye, Rongguang and Ren, Dongwei},
  title     = {Distance-IoU Loss: Faster and Better Learning for Bounding Box Regression},
  booktitle = {The AAAI Conference on Artificial Intelligence (AAAI)},
  year      = {2020},
}

@Article{zheng2021ciou,
  author    = {Zheng, Zhaohui and Wang, Ping and Ren, Dongwei and Liu, Wei and Ye, Rongguang and Hu, Qinghua and Zuo, Wangmeng},
  title     = {Enhancing Geometric Factors in Model Learning and Inference for Object Detection and Instance Segmentation},
  booktitle = {IEEE Transactions on Cybernetics},
  year      = {2021},
}

Introduction

In this paper, we propose Complete-IoU (CIoU) loss and Cluster-NMS for enhancing geometric factors in both bounding box regression and Non-Maximum Suppression (NMS), leading to notable gains of average precision (AP) and average recall (AR), without the sacrifice of inference efficiency. In particular, we consider three geometric factors, i.e., overlap area, normalized central point distance and aspect ratio, which are crucial for measuring bounding box regression in object detection and instance segmentation. The three geometric factors are then incorporated into CIoU loss for better distinguishing difficult regression cases. The training of deep models using CIoU loss results in consistent AP and AR improvements in comparison to widely adopted Ln-norm loss and IoU-based loss. Furthermore, we propose Cluster-NMS, where NMS during inference is done by implicitly clustering detected boxes and usually requires less iterations. Cluster-NMS is very efficient due to its pure GPU implementation, and geometric factors can be incorporated to improve both AP and AR. In the experiments, CIoU loss and Cluster-NMS have been applied to state-of-the-art instance segmentation (e.g., YOLACT), and object detection (e.g., YOLO v3, SSD and Faster R-CNN) models.

This repo only focuses on NMS improvement based on https://github.com/ultralytics/yolov3.

See non_max_suppression function of utils/utils.py for our Cluster-NMS implementation.

This directory contains PyTorch YOLOv3 software developed by Ultralytics LLC, and is freely available for redistribution under the GPL-3.0 license. For more information please visit https://www.ultralytics.com.

Description

The https://github.com/ultralytics/yolov3 repo contains inference and training code for YOLOv3 in PyTorch. The code works on Linux, MacOS and Windows. Training is done on the COCO dataset by default: https://cocodataset.org/#home. Credit to Joseph Redmon for YOLO: https://pjreddie.com/darknet/yolo/.

Requirements

Python 3.7 or later with all pip install -U -r requirements.txt packages including torch >= 1.5. Docker images come with all dependencies preinstalled. Docker requirements are:

  • Nvidia Driver >= 440.44
  • Docker Engine - CE >= 19.03

mAP

Size COCO mAP
@0.5...0.95
COCO mAP
@0.5
YOLOv3-tiny
YOLOv3
YOLOv3-SPP
YOLOv3-SPP-ultralytics
320 14.0
28.7
30.5
37.7
29.1
51.8
52.3
56.8
YOLOv3-tiny
YOLOv3
YOLOv3-SPP
YOLOv3-SPP-ultralytics
416 16.0
31.2
33.9
41.2
33.0
55.4
56.9
60.6
YOLOv3-tiny
YOLOv3
YOLOv3-SPP
YOLOv3-SPP-ultralytics
512 16.6
32.7
35.6
42.6
34.9
57.7
59.5
62.4
YOLOv3-tiny
YOLOv3
YOLOv3-SPP
YOLOv3-SPP-ultralytics
608 16.6
33.1
37.0
43.1
35.4
58.2
60.7
62.8

Cluster-NMS

Hardware

  • 2 GTX 1080 Ti
  • Intel(R) Core(TM) i7-6850K CPU @ 3.60GHz

Evaluation command: python3 test.py --cfg yolov3-spp.cfg --weights yolov3-spp-ultralytics.pt

AP reports on coco 2014 minival.

Image Size Model NMS FPS box AP box AP75 box AR100
608 YOLOv3-SPP-ultralytics Fast NMS 85.5 42.2 45.1 60.1
608 YOLOv3-SPP-ultralytics Original NMS 14.6 42.6 45.8 62.5
608 YOLOv3-SPP-ultralytics DIoU-NMS 7.9 42.7 46.2 63.4
608 YOLOv3-SPP-ultralytics Original NMS Torchvision 95.2 42.6 45.8 62.5
608 YOLOv3-SPP-ultralytics Cluster-NMS 82.6 42.6 45.8 62.5
608 YOLOv3-SPP-ultralytics Cluster-DIoU-NMS 76.9 42.7 46.2 63.4
608 YOLOv3-SPP-ultralytics Weighted-NMS 11.2 42.9 46.4 62.7
608 YOLOv3-SPP-ultralytics Weighted Cluster-NMS 68.0 42.9 46.4 62.7
608 YOLOv3-SPP-ultralytics Weighted + Cluster-DIoU-NMS 64.9 43.1 46.8 63.7
608 YOLOv3-SPP-ultralytics Merge + Torchvision NMS 88.5 42.8 46.3 63.0
608 YOLOv3-SPP-ultralytics Merge + DIoU + Torchvision NMS 82.5 43.0 46.6 63.2

Conclusion

  • Merge NMS is a simplified version of Weighted-NMS. It just use score vector for weighted coordinates, not combine score and IoU. (Refer to CAD for the details of Weighted-NMS.)

  • We further incorporate DIoU into NMS for YOLOv3 which can get higher AP and AR.

  • Note that Torchvision NMS has the fastest speed, that is owing to CUDA implementation and engineering accelerations (like upper triangular IoU matrix only). However, our Cluster-NMS requires less iterations for NMS and can also be further accelerated by adopting engineering tricks. Almost completed at the same time as the work of our paper is Glenn Jocher's Torchvision NMS + Merge. First, we do Torchvision NMS, then convert the output to vector to multiply the IoU matrix. Also, for Merge NMS, the IoU matrix is no need to be square shape n*n. It can be m*n to save more time, where m is the boxes that NMS outputs.

  • Currently, Torchvision NMS use IoU as criterion, not DIoU. However, if we directly replace IoU with DIoU in Original NMS, it will costs much more time due to the sequence operation. Now, Cluster-DIoU-NMS will significantly speed up DIoU-NMS and obtain exactly the same result.

  • Torchvision NMS is a function in Torchvision>=0.3, and our Cluster-NMS can be applied to any projects that use low version of Torchvision and other deep learning frameworks as long as it can do matrix operations. No other import, no need to compile, less iteration, fully GPU-accelerated and better performance.

Citation

DOI