/deep_learning_object_detection

A paper list of object detection using deep learning.

deep learning object detection

A paper list of object detection using deep learning. I wrote this page with reference to this survey paper and searching and searching..

Last updated: 2020/01/13

Update log

2018/9/18 - update all of recent papers and make some diagram about history of object detection using deep learning. 2018/9/26 - update codes of papers. (official and unofficial)
2018/october - update 5 papers and performance table.
2018/november - update 9 papers.
2018/december - update 8 papers and and performance table and add new diagram(2019 version!!).
2019/january - update 4 papers and and add commonly used datasets.
2019/february - update 3 papers.
2019/march - update figure and code links.
2019/april - remove author's names and update ICLR 2019 & CVPR 2019 papers.
2019/may - update CVPR 2019 papers.
2019/june - update CVPR 2019 papers and dataset paper.
2019/july - update BMVC 2019 papers and some of ICCV 2019 papers.
2019/september - update NeurIPS 2019 papers and ICCV 2019 papers.
2019/november - update some of AAAI 2020 papers and other papers.
2020/january - update ICLR 2020 papers and other papers.

Table of Contents

Paper list from 2014 to now(2019)

The part highlighted with red characters means papers that i think "must-read". However, it is my personal opinion and other papers are important too, so I recommend to read them if you have time.

Performance table

FPS(Speed) index is related to the hardware spec(e.g. CPU, GPU, RAM, etc), so it is hard to make an equal comparison. The solution is to measure the performance of all models on hardware with equivalent specifications, but it is very difficult and time consuming.

Detector VOC07 (mAP@IoU=0.5) VOC12 (mAP@IoU=0.5) COCO (mAP@IoU=0.5:0.95) Published In
R-CNN 58.5 - - CVPR'14
SPP-Net 59.2 - - ECCV'14
MR-CNN 78.2 (07+12) 73.9 (07+12) - ICCV'15
Fast R-CNN 70.0 (07+12) 68.4 (07++12) 19.7 ICCV'15
Faster R-CNN 73.2 (07+12) 70.4 (07++12) 21.9 NIPS'15
YOLO v1 66.4 (07+12) 57.9 (07++12) - CVPR'16
G-CNN 66.8 66.4 (07+12) - CVPR'16
AZNet 70.4 - 22.3 CVPR'16
ION 80.1 77.9 33.1 CVPR'16
HyperNet 76.3 (07+12) 71.4 (07++12) - CVPR'16
OHEM 78.9 (07+12) 76.3 (07++12) 22.4 CVPR'16
MPN - - 33.2 BMVC'16
SSD 76.8 (07+12) 74.9 (07++12) 31.2 ECCV'16
GBDNet 77.2 (07+12) - 27.0 ECCV'16
CPF 76.4 (07+12) 72.6 (07++12) - ECCV'16
R-FCN 79.5 (07+12) 77.6 (07++12) 29.9 NIPS'16
DeepID-Net 69.0 - - PAMI'16
NoC 71.6 (07+12) 68.8 (07+12) 27.2 TPAMI'16
DSSD 81.5 (07+12) 80.0 (07++12) 33.2 arXiv'17
TDM - - 37.3 CVPR'17
FPN - - 36.2 CVPR'17
YOLO v2 78.6 (07+12) 73.4 (07++12) - CVPR'17
RON 77.6 (07+12) 75.4 (07++12) 27.4 CVPR'17
DeNet 77.1 (07+12) 73.9 (07++12) 33.8 ICCV'17
CoupleNet 82.7 (07+12) 80.4 (07++12) 34.4 ICCV'17
RetinaNet - - 39.1 ICCV'17
DSOD 77.7 (07+12) 76.3 (07++12) - ICCV'17
SMN 70.0 - - ICCV'17
Light-Head R-CNN - - 41.5 arXiv'17
YOLO v3 - - 33.0 arXiv'18
SIN 76.0 (07+12) 73.1 (07++12) 23.2 CVPR'18
STDN 80.9 (07+12) - - CVPR'18
RefineDet 83.8 (07+12) 83.5 (07++12) 41.8 CVPR'18
SNIP - - 45.7 CVPR'18
Relation-Network - - 32.5 CVPR'18
Cascade R-CNN - - 42.8 CVPR'18
MLKP 80.6 (07+12) 77.2 (07++12) 28.6 CVPR'18
Fitness-NMS - - 41.8 CVPR'18
RFBNet 82.2 (07+12) - - ECCV'18
CornerNet - - 42.1 ECCV'18
PFPNet 84.1 (07+12) 83.7 (07++12) 39.4 ECCV'18
Pelee 70.9 (07+12) - - NIPS'18
HKRM 78.8 (07+12) - 37.8 NIPS'18
M2Det - - 44.2 AAAI'19
R-DAD 81.2 (07++12) 82.0 (07++12) 43.1 AAAI'19
ScratchDet 84.1 (07++12) 83.6 (07++12) 39.1 CVPR'19
Libra R-CNN - - 43.0 CVPR'19
Reasoning-RCNN 82.5 (07++12) - 43.2 CVPR'19
FSAF - - 44.6 CVPR'19
AmoebaNet + NAS-FPN - - 47.0 CVPR'19
Cascade-RetinaNet - - 41.1 CVPR'19
TridentNet - - 48.4 ICCV'19
DAFS 85.3 (07+12) 83.1 (07++12) 40.5 ICCV'19
Auto-FPN 81.8 (07++12) - 40.5 ICCV'19
FCOS - - 44.7 ICCV'19
FreeAnchor - - 44.8 NeurIPS'19
DetNAS 81.5 (07++12) - 42.0 NeurIPS'19
NATS - - 42.0 NeurIPS'19
AmoebaNet + NAS-FPN + AA - - 50.7 arXiv'19
EfficientDet - - 51.0 arXiv'19

2014

2015

2016

2017

2018

2019

  • [M2Det] M2Det: A Single-Shot Object Detector based on Multi-Level Feature Pyramid Network | [AAAI' 19] |[pdf] [official code - pytorch]

  • [R-DAD] Object Detection based on Region Decomposition and Assembly | [AAAI' 19] |[pdf]

  • [CAMOU] CAMOU: Learning Physical Vehicle Camouflages to Adversarially Attack Detectors in the Wild | [ICLR' 19] |[pdf]

  • Feature Intertwiner for Object Detection | [ICLR' 19] |[pdf]

  • [GIoU] Generalized Intersection over Union: A Metric and A Loss for Bounding Box Regression | [CVPR' 19] |[pdf]

  • Automatic adaptation of object detectors to new domains using self-training | [CVPR' 19] |[pdf]

  • [Libra R-CNN] Libra R-CNN: Balanced Learning for Object Detection | [CVPR' 19] |[pdf]

  • [FSAF] Feature Selective Anchor-Free Module for Single-Shot Object Detection | [CVPR' 19] |[pdf]

  • [ExtremeNet] Bottom-up Object Detection by Grouping Extreme and Center Points | [CVPR' 19] |[pdf] | [official code - pytorch]

  • [C-MIL] C-MIL: Continuation Multiple Instance Learning for Weakly Supervised Object Detection | [CVPR' 19] |[pdf] | [official code - torch]

  • [ScratchDet] ScratchDet: Training Single-Shot Object Detectors from Scratch | [CVPR' 19] |[pdf]

  • Bounding Box Regression with Uncertainty for Accurate Object Detection | [CVPR' 19] |[pdf] | [official code - caffe2]

  • Activity Driven Weakly Supervised Object Detection | [CVPR' 19] |[pdf]

  • Towards Accurate One-Stage Object Detection with AP-Loss | [CVPR' 19] |[pdf]

  • Strong-Weak Distribution Alignment for Adaptive Object Detection | [CVPR' 19] |[pdf] | [official code - pytorch]

  • [NAS-FPN] NAS-FPN: Learning Scalable Feature Pyramid Architecture for Object Detection | [CVPR' 19] |[pdf]

  • [Adaptive NMS] Adaptive NMS: Refining Pedestrian Detection in a Crowd | [CVPR' 19] |[pdf]

  • Point in, Box out: Beyond Counting Persons in Crowds | [CVPR' 19] |[pdf]

  • Locating Objects Without Bounding Boxes | [CVPR' 19] |[pdf]

  • Sampling Techniques for Large-Scale Object Detection from Sparsely Annotated Objects | [CVPR' 19] |[pdf]

  • Towards Universal Object Detection by Domain Attention | [CVPR' 19] |[pdf]

  • Exploring the Bounds of the Utility of Context for Object Detection | [CVPR' 19] |[pdf]

  • What Object Should I Use? - Task Driven Object Detection | [CVPR' 19] |[pdf]

  • Dissimilarity Coefficient based Weakly Supervised Object Detection | [CVPR' 19] |[pdf]

  • Adapting Object Detectors via Selective Cross-Domain Alignment | [CVPR' 19] |[pdf]

  • Fully Quantized Network for Object Detection | [CVPR' 19] |[pdf]

  • Distilling Object Detectors with Fine-grained Feature Imitation | [CVPR' 19] |[pdf]

  • Multi-task Self-Supervised Object Detection via Recycling of Bounding Box Annotations | [CVPR' 19] |[pdf]

  • [Reasoning-RCNN] Reasoning-RCNN: Unifying Adaptive Global Reasoning into Large-scale Object Detection | [CVPR' 19] |[pdf]

  • Arbitrary Shape Scene Text Detection with Adaptive Text Region Representation | [CVPR' 19] |[pdf]

  • Assisted Excitation of Activations: A Learning Technique to Improve Object Detectors | [CVPR' 19] |[pdf]

  • Spatial-aware Graph Relation Network for Large-scale Object Detection | [CVPR' 19] |[pdf]

  • [MaxpoolNMS] MaxpoolNMS: Getting Rid of NMS Bottlenecks in Two-Stage Object Detectors | [CVPR' 19] |[pdf]

  • You reap what you sow: Generating High Precision Object Proposals for Weakly-supervised Object Detection | [CVPR' 19] |[pdf]

  • Object detection with location-aware deformable convolution and backward attention filtering | [CVPR' 19] |[pdf]

  • Diversify and Match: A Domain Adaptive Representation Learning Paradigm for Object Detection | [CVPR' 19] |[pdf]

  • [GFR] Improving Object Detection from Scratch via Gated Feature Reuse | [BMVC' 19] |[pdf] | [official code - pytorch]

  • [Cascade RetinaNet] Cascade RetinaNet: Maintaining Consistency for Single-Stage Object Detection | [BMVC' 19] |[pdf]

  • Soft Sampling for Robust Object Detection | [BMVC' 19] |[pdf]

  • Multi-adversarial Faster-RCNN for Unrestricted Object Detection | [ICCV' 19] |[pdf]

  • Towards Adversarially Robust Object Detection | [ICCV' 19] |[pdf]

  • A Robust Learning Approach to Domain Adaptive Object Detection | [ICCV' 19] |[pdf]

  • A Delay Metric for Video Object Detection: What Average Precision Fails to Tell | [ICCV' 19] |[pdf]

  • Delving Into Robust Object Detection From Unmanned Aerial Vehicles: A Deep Nuisance Disentanglement Approach | [ICCV' 19] |[pdf]

  • Employing Deep Part-Object Relationships for Salient Object Detection | [ICCV' 19] |[pdf]

  • Learning Rich Features at High-Speed for Single-Shot Object Detection | [ICCV' 19] |[pdf]

  • Structured Modeling of Joint Deep Feature and Prediction Refinement for Salient Object Detection | [ICCV' 19] |[pdf]

  • Selectivity or Invariance: Boundary-Aware Salient Object Detection | [ICCV' 19] |[pdf]

  • Progressive Sparse Local Attention for Video Object Detection | [ICCV' 19] |[pdf]

  • Minimum Delay Object Detection From Video | [ICCV' 19] |[pdf]

  • Towards Interpretable Object Detection by Unfolding Latent Structures | [ICCV' 19] |[pdf]

  • Scaling Object Detection by Transferring Classification Weights | [ICCV' 19] |[pdf]

  • [TridentNet] Scale-Aware Trident Networks for Object Detection | [ICCV' 19] |[pdf]

  • Generative Modeling for Small-Data Object Detection | [ICCV' 19] |[pdf]

  • Transductive Learning for Zero-Shot Object Detection | [ICCV' 19] |[pdf]

  • Self-Training and Adversarial Background Regularization for Unsupervised Domain Adaptive One-Stage Object Detection | [ICCV' 19] |[pdf]

  • [CenterNet] CenterNet: Keypoint Triplets for Object Detection | [ICCV' 19] |[pdf]

  • [DAFS] Dynamic Anchor Feature Selection for Single-Shot Object Detection | [ICCV' 19] |[pdf]

  • [Auto-FPN] Auto-FPN: Automatic Network Architecture Adaptation for Object Detection Beyond Classification | [ICCV' 19] |[pdf]

  • Multi-Adversarial Faster-RCNN for Unrestricted Object Detection | [ICCV' 19] |[pdf]

  • Object Guided External Memory Network for Video Object Detection | [ICCV' 19] |[pdf]

  • [ThunderNet] ThunderNet: Towards Real-Time Generic Object Detection on Mobile Devices | [ICCV' 19] |[pdf]

  • [RDN] Relation Distillation Networks for Video Object Detection | [ICCV' 19] |[pdf]

  • [MMNet] Fast Object Detection in Compressed Video | [ICCV' 19] |[pdf]

  • Towards High-Resolution Salient Object Detection | [ICCV' 19] |[pdf]

  • [SCAN] Stacked Cross Refinement Network for Edge-Aware Salient Object Detection | [ICCV' 19] |[official code] |[pdf]

  • Motion Guided Attention for Video Salient Object Detection | [ICCV' 19] |[pdf]

  • Semi-Supervised Video Salient Object Detection Using Pseudo-Labels | [ICCV' 19] |[pdf]

  • Learning to Rank Proposals for Object Detection | [ICCV' 19] |[pdf]

  • [WSOD2] WSOD2: Learning Bottom-Up and Top-Down Objectness Distillation for Weakly-Supervised Object Detection | [ICCV' 19] |[pdf]

  • [ClusDet] Clustered Object Detection in Aerial Images | [ICCV' 19] |[pdf]

  • Towards Precise End-to-End Weakly Supervised Object Detection Network | [ICCV' 19] |[pdf]

  • Few-Shot Object Detection via Feature Reweighting | [ICCV' 19] |[pdf]

  • [Objects365] Objects365: A Large-Scale, High-Quality Dataset for Object Detection | [ICCV' 19] |[pdf]

  • [EGNet] EGNet: Edge Guidance Network for Salient Object Detection | [ICCV' 19] |[pdf]

  • Optimizing the F-Measure for Threshold-Free Salient Object Detection | [ICCV' 19] |[pdf]

  • Sequence Level Semantics Aggregation for Video Object Detection | [ICCV' 19] |[pdf]

  • [NOTE-RCNN] NOTE-RCNN: NOise Tolerant Ensemble RCNN for Semi-Supervised Object Detection | [ICCV' 19] |[pdf]

  • Enriched Feature Guided Refinement Network for Object Detection | [ICCV' 19] |[pdf]

  • [POD] POD: Practical Object Detection With Scale-Sensitive Network | [ICCV' 19] |[pdf]

  • [FCOS] FCOS: Fully Convolutional One-Stage Object Detection | [ICCV' 19] |[pdf]

  • [RepPoints] RepPoints: Point Set Representation for Object Detection | [ICCV' 19] |[pdf]

  • Better to Follow, Follow to Be Better: Towards Precise Supervision of Feature Super-Resolution for Small Object Detection | [ICCV' 19] |[pdf]

  • Weakly Supervised Object Detection With Segmentation Collaboration | [ICCV' 19] |[pdf]

  • Leveraging Long-Range Temporal Relationships Between Proposals for Video Object Detection | [ICCV' 19] |[pdf]

  • Detecting 11K Classes: Large Scale Object Detection Without Fine-Grained Bounding Boxes | [ICCV' 19] |[pdf]

  • [C-MIDN] C-MIDN: Coupled Multiple Instance Detection Network With Segmentation Guidance for Weakly Supervised Object Detection | [ICCV' 19] |[pdf]

  • Meta-Learning to Detect Rare Objects | [ICCV' 19] |[pdf]

  • [Cap2Det] Cap2Det: Learning to Amplify Weak Caption Supervision for Object Detection | [ICCV' 19] |[pdf]

  • [Gaussian YOLOv3] Gaussian YOLOv3: An Accurate and Fast Object Detector using Localization Uncertainty for Autonomous Driving | [ICCV' 19] |[pdf] [official code - c]

  • [FreeAnchor] FreeAnchor: Learning to Match Anchors for Visual Object Detection | [NeurIPS' 19] |[pdf]

  • Memory-oriented Decoder for Light Field Salient Object Detection | [NeurIPS' 19] |[pdf]

  • One-Shot Object Detection with Co-Attention and Co-Excitation | [NeurIPS' 19] |[pdf]

  • [DetNAS] DetNAS: Backbone Search for Object Detection | [NeurIPS' 19] |[pdf]

  • Consistency-based Semi-supervised Learning for Object detection | [NeurIPS' 19] |[pdf]

  • [NATS] Efficient Neural Architecture Transformation Searchin Channel-Level for Object Detection | [NeurIPS' 19] |[pdf]

  • [AA] Learning Data Augmentation Strategies for Object Detection | [arXiv' 19] |[pdf]

  • [EfficientDet] EfficientDet: Scalable and Efficient Object Detection | [arXiv' 19] |[pdf]

2020

  • [Spiking-YOLO] Spiking-YOLO: Spiking Neural Network for Real-time Object Detection | [AAAI' 20] |[pdf]

  • Tell Me What They're Holding: Weakly-supervised Object Detection with Transferable Knowledge from Human-object Interaction | [AAAI' 20] |[pdf]

  • Distance-IoU Loss: Faster and Better Learning for Bounding Box Regression | [AAAI' 20] |[pdf]

  • Computation Reallocation for Object Detection | [ICLR' 20] |[pdf]

Dataset Papers

Statistics of commonly used object detection datasets. The Table came from this survey paper.

Challenge Object Classes Number of Images Number of Annotated Images
Train Val Test Train Val
PASCAL VOC Object Detection Challenge
VOC07 20 2,501 2,510 4,952 6,301 (7,844) 6,307 (7,818)
VOC08 20 2,111 2,221 4,133 5,082 (6,337) 5,281 (6,347)
VOC09 20 3,473 3,581 6,650 8,505 (9,760) 8,713 (9,779)
VOC10 20 4,998 5,105 9,637 11,577 (13,339) 11,797 (13,352)
VOC11 20 5,717 5,823 10,994 13,609 (15,774) 13,841 (15,787)
VOC12 20 5,717 5,823 10,991 13,609 (15,774) 13,841 (15,787)
ILSVRC Object Detection Challenge
ILSVRC13 200 395,909 20,121 40,152 345,854 55,502
ILSVRC14 200 456,567 20,121 40,152 478,807 55,502
ILSVRC15 200 456,567 20,121 51,294 478,807 55,502
ILSVRC16 200 456,567 20,121 60,000 478,807 55,502
ILSVRC17 200 456,567 20,121 65,500 478,807 55,502
MS COCO Object Detection Challenge
MS COCO15 80 82,783 40,504 81,434 604,907 291,875
MS COCO16 80 82,783 40,504 81,434 604,907 291,875
MS COCO17 80 118,287 5,000 40,670 860,001 36,781
MS COCO18 80 118,287 5,000 40,670 860,001 36,781
Open Images Object Detection Challenge
OID18 500 1,743,042 41,620 125,436 12,195,144

The papers related to datasets used mainly in Object Detection are as follows.

  • [PASCAL VOC] The PASCAL Visual Object Classes (VOC) Challenge | [IJCV' 10] | [pdf]

  • [PASCAL VOC] The PASCAL Visual Object Classes Challenge: A Retrospective | [IJCV' 15] | [pdf] | [link]

  • [ImageNet] ImageNet: A Large-Scale Hierarchical Image Database| [CVPR' 09] | [pdf]

  • [ImageNet] ImageNet Large Scale Visual Recognition Challenge | [IJCV' 15] | [pdf] | [link]

  • [COCO] Microsoft COCO: Common Objects in Context | [ECCV' 14] | [pdf] | [link]

  • [Open Images] The Open Images Dataset V4: Unified image classification, object detection, and visual relationship detection at scale | [arXiv' 18] | [pdf] | [link]

  • [DOTA] DOTA: A Large-scale Dataset for Object Detection in Aerial Images | [CVPR' 18] | [pdf] | [link]

  • [Objects365] Objects365: A Large-Scale, High-Quality Dataset for Object Detection | [ICCV' 19] | [link]

Contact & Feedback

If you have any suggestions about papers, feel free to mail me :)