Awesome-Object-Detection

Last Update: 2019/08/16

TODO:

Fix the link and format issues
Add paper link to SOTA tables

A list of awesome object detection resources.

Recently we released survey (Recent Advances in Deep Learning for Object Detection) to the community. In this survey, we systematically analyze the existing object detection frameworks and organize the survey into three major parts: (i) detection components, (ii) learning strategies, and (iii) applications & benchmarks. In the survey, we cover a variety of factors affecting the detection performance in detail, such as detector architectures, feature learning, proposal generation, sampling strategies, etc. Finally, we discuss several future directions to facilitate and spur future research for visual object detection with deep learning.

After completing this survey, we decided to release the collected resource of object detection. We will keep updating our survey as well as this resource collection, since this area moves too fast. If you have any questions or suggestions, please feel free to contact us.

Table of Contents

1. Generic Object Detection
- 1.1 Two-stage Detection Algorithms
- 1.2 One-stage Detection Algorithms
2. Face Detection
3. Pedestrian Detection
4. Benchmarks
5. SOTA
- 5.1 Pascal VOC
- 5.2 MSCOCO
6. Emerging Ideas
7. Other Resources

Citing this work

If this repository is useful, please cite our survey.

@article{wu2019recent,
    title={Recent Advances in Deep Learning for Object Detection},
    author={Xiongwei Wu, Doyen Sahoo, Steven C.H. Hoi},
    journal={arXiv preprint arXiv:1908.03673},
    year={2019}
}

1. Generic Object Detection

1.1 Two-stage Detection

2014 CVPR

Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation, R. Girshick, J. Donahue, T. Darrell, J. Malik, [OpenAccess],[Supplementary], [Caffe], RCNN

2014 ECCV

Spatial pyramid pooling in deep convolutional networks for visual recognition, K. He, X. Zhang, S. Ren, J. Sun, [Arxiv], [Caffe-Matlab], SPP-Net

2015 CVPR

Deepid-net: Deformable deep convolutional neural networks for object detection,W. Ouyang, X. Wang, X. Zeng, S. Qiu, P. Luo, Y. Tian, H. Li, S. Yang, Z. Wang, C.-C. Loy, [OpenAccess]
segdeepm: Exploiting segmentation and context in deep neural networks for object detection, Y. Zhu, R. Urtasun, R. Salakhutdinov, S. Fidler, [OpenAccess]
Deformable part models are convolutional neural networks, R. Girshick, F. Iandola, T. Darrell, J. Malik, [OpenAccess]

2015 ICCV

Fast r-cnn, R. Girshick, [OpenAccess], [Caffe-Python], Fast R-CNN
Object detection via a multi-region and semantic segmentation-aware cnn model, S. Gidaris, N. Komodakis, [OpenAccess], [Caffe], MR-CNN
Deepproposal: Hunting objects by cascading deep convolutional layers, A. Ghodrati, A. Diba, M. Pedersoli, T. Tuytelaars, L. Van Gool, [OpenAccess], [MatConvnet], Deepproposal

2015 NeurIPS

Faster r-cnn: Towards real-time object detection with region proposal networks, S. Ren, K. He, R. Girshick, J. Sun, [OpenAccess],[Arxiv],[Caffe-Matlab], [Caffe-Python],[Pytorch], [TensorFlow], [MXNet], Faster R-CNN

2016 CVPR

Hypernet: Towards accurate region proposal generation and joint object detection, T. Kong, A. Yao, Y. Chen, F. Sun, [OpenAccess], HyperNet
Inside-outside net: Detecting objects in context with skip pooling and recurrent neural networks, S. Bell, C. Lawrence Zitnick, K. Bala, R. Girshick, [OpenAccess], ION
Object detection from video tubelets with convolutional neural networks, K. Kang, W. Ouyang, H. Li, X. Wang, [OpenAccess], [Caffe], T-CNN
Instance-aware semantic segmentation via multitask network cascades, J. Dai, K. He, J. Sun, [OpenAccess], [Caffe], MNC
Adaptive object detection using adjacency and zoom prediction, Y. Lu, T. Javidi, S. Lazebnik, [Arxiv], [Caffe], AZ-Net
Training region-based object detectors with online hard example mining, A. Shrivastava, A. Gupta, R. Girshick, [OpenAccess], [Caffe], OHEM
Locnet: Improving localization accuracy for object detection, S. Gidaris, N. Komodakis, [OpenAccess], [Matlab], LocNet
Craft objects from images, B. Yang, J. Yan, Z. Lei, S. Z. Li, [OpenAccess], [Caffe], CRAFT

2016 ECCV

Contextual priming and feedback for faster r-cnn, A. Shrivastava, A. Gupta, [OpenAccess]
Gated bi-directional cnn for object detection, X. Zeng, W. Ouyang, B. Yang, J. Yan, X. Wang, [OpenAccess]

2016 NeurIPS

R-fcn: Object detection via region-based fully convolutional networks, J. Dai, Y. Li, K. He, J. Sun, [OpenAccess], [Caffe-Matlab], [Caffe-Python], R-FCN

2016 Others

Beyond skip connections: Top-down modulation for object detection, A. Shrivastava, R. Sukthankar, J. Malik, A. Gupta, in: arXiv preprint arXiv:1612.06851, 2016. [Arxiv], TDM
A multipath network for object detection, S. Zagoruyko, A. Lerer, T.-Y. Lin, P. O. Pinheiro, S. Gross, S. Chintala, P. Dollar, in: BMVC, 2016. [Arxiv], [Torch], MultiPathNet
Pvanet: deep but lightweight neural networks for real-time object detection, K.-H. Kim, S. Hong, B. Roh, Y. Cheon, M. Park, in: arXiv preprint arXiv:1608.08021, 2016. [Arxiv], [Caffe], PVANet

2017 CVPR

Feature pyramid networks for object detection, T.Y. Lin, P. Dollar, R. Girshick, K. He, B. Hariharan, S. Belongie, [OpenAccess], [Caffe2], FPN
Perceptual generative adversarial networks for small object detection, J. Li, X. Liang, Y. Wei, T. Xu, J. Feng, S. Yan, [OpenAccess], PGAN
A-fast-rcnn: Hard positive generation via adversary for object detection, X. Wang, A. Shrivastava, A. Gupta, [OpenAccess], Caffe],A-Fast-RCNN
Mimicking very efficient network for object detection, Q. Li, S. Jin, J. Yan, [OpenAccess]
Learning non-maximum suppression, J. Hosang, R. Benenson, B. Schiele, [OpenAccess], [TensorFlow]
Speed/accuracy trade-offs for modern convolutional object detectors, J. Huang, V. Rathod, C. Sun, M. Zhu, A. Korattikara, A. Fathi, I. Fischer, Z. Wojna, Y. Song, S. Guadarrama, et al., [OpenAccess], [TensorFlow]

2017 ICCV

Mask R-CNN, K. He, G. Gkioxari, P. Dollar, R. Girshick, [OpenAccess],[Caffe2], [Slides], Mask R-CNN
Denet: Scalable real-time object detection with directed sparse sampling, L. Tychsen-Smith, L. Petersson, [OpenAccess],[Theano], DeNet
Deformable convolutional networks, J. Dai, H. Qi, Y. Xiong, Y. Li, G. Zhang, H. Hu, Y. Wei, [OpenAccess],[MXNet], DCN
Couplenet: Coupling global structure with local parts for object detection, Y. Zhu, C. Zhao, J. Wang, X. Zhao, Y. Wu, H. Lu, [OpenAccess],[Caffe], CoupleNet
Spatial memory for context reasoning in object detection, X. Chen, A. Gupta, [OpenAccess], SMN
Soft-nms – improving object detection with one line of code, N. Bodla, B. Singh, R. Chellappa, L. S. Davis, [OpenAccess], [Caffe]

2017 Others

Light-head rcnn: In defense of two-stage object detector, Z. Li, C. Peng, G. Yu, X. Zhang, Y. Deng, J. Sun, in: arXiv preprint arXiv:1711.07264, 2017. [Arxiv], [Pytorch], [TensorFlow]
Zoom out-and-in network with recursive training for object proposal, H. Li, Y. Liu, W. Ouyang, X. Wang, in: arXiv preprint arXiv:1702.05711, 2017. [Arxiv]

2018 CVPR

Cascade r-cnn: Delving into high quality object detection, Z. Cai, N. Vasconcelos, [OpenAccess], [Caffe], [Caffe2] Cascade R-CNN
Detnet: A backbone network for object detection, Z. Li, C. Peng, G. Yu, X. Zhang, Y. Deng, J. Sun, [OpenAccess], [Pytorch*], DetNet
An analysis of scale invariance in object detection–snip, B. Singh, L. S. Davis, [OpenAccess], [MXNet], SNIP
Multi-scale location-aware kernel representation for object detection, H. Wang, Q. Wang, M. Gao, P. Li, W. Zuo, [OpenAccess], [Caffe], MLKR
Feature selective networks for object detection, Y. Zhai, J. Fu, Y. Lu, H. Li, [OpenAccess]
Pseudo mask augmented object detection, X. Zhao, S. Liang, Y. Wei, [OpenAccess]
Structure inference net: Object detection using scene-level context and instance-level relationships, Y. Liu, R. Wang, S. Shan, X. Chen, [OpenAccess], [TensorFlow], SIN
Relation networks for object detection, H. Hu, J. Gu, Z. Zhang, J. Dai, Y. Wei, [OpenAccess], [MXNet]
Path Aggregation Network for Instance Segmentation, S. Liu, L. Qi, H. Qin, J. Shi and J. Jia, [OpenAccess], [Pytorch]

2018 ECCV

Acquisition of localization confidence for accurate object detection, B. Jiang, R. Luo, J. Mao, T. Xiao, Y. Jiang, [OpenAccess], [Pytorch], IoU-Net
Revisiting rcnn: On awakening the classification power of faster rcnn, B. Cheng, Y. Wei, H. Shi, R. Feris, J. Xiong, T. Huang, [OpenAccess], [MXNet]
Learning region features for object detection, J. Gu, H. Hu, L. Wang, Y. Wei, J. Dai, [OpenAccess]
Deep regionlets for object detection, H. Xu, X. Lv, X. Wang, Z. Ren, R. Chellappa, [OpenAccess]
Context refinement for object detection, Z. Chen, S. Huang, D. Tao, [OpenAccess]

2018 NeurIPS

Metaanchor: Learning to detect objects with customized anchors, T. Yang, X. Zhang, Z. Li, W. Zhang, J. Sun, [OpenAccess], MetaAnchor
Sniper: Efficient multi-scale training, B. Singh, M. Najibi, L. S. Davis, [OpenAccess], [MXNet], SNIPER

2019 AAAI

Derpn: Taking a further step toward more general object detection, L. J. Z. X. Lele Xie, Yuliang Liu, [OpenAccess], [Caffe], DeRPN
Object Detection based on Region Decomposition and Assembly, S.-H Bae, [OpenAccess], R-DAD

2019 CVPR

Mask scoring r-cnn, Z. Huang, L. Huang, Y. Gong, C. Huang, X. Wang, [OpenAccess], [Pytorch], Mask Scoring R-CNN
Deformable convnets v2: More deformable, better results, S. L. Xizhou Zhu, Han Hu, J. Dai, [OpenAccess], [MXNet], DCNv2
Grid r-cnn, X. Lu, B. Li, Y. Yue, Q. Li, J. Yan, [OpenAccess], [mmdetection]
Nas-fpn: Learning scalable feature pyramid architecture for object detection, G. Ghiasi, T.-Y. Lin, Q. V. Le, [OpenAccess], [TensorFlow], NAS-FPN
Bounding Box Regression with Uncertainty for Accurate Object Detection, Y. He, C. Zhu, J. Wang, M. Savvides, X. Zhang, [OpenAccess], [Caffe2], KL-Loss
Libra R-CNN: Towards Balanced Learning for Object Detection, J. Pang, K. Chen, J. Shi, H. Feng, W. Ouyang, D. Lin, [OpenAccess], [Pytorch], [mmdetection], Libra R-CNN
Region Proposal by Guided Anchoring, J. Wang, K. Chen, S. Yang, C. C. Loy, D. Lin, [OpenAccess], [mmdetection]

2019 ICCV

Rethinking imagenet pre-training, R. G. Kaiming He, P. Dollro, [OpenAccess]

2019 Others

Scale-aware trident networks for object detection, Y. Li, Y. Chen, N. Wang, Z. Zhang, in: arXiv preprint arXiv:1901.01892, 2019. [OpenAccess], [MXNet], TridentNet

2019 NeurIPS

1.2 One-stage Detection

Before 2014

Overfeat: Integrated recognition, localization and detection using convolutional networks, P. Sermanet, D. Eigen, X. Zhang, M. Mathieu, R. Fergus, Y. LeCun, in: arXiv preprint arXiv:1312.6229, 2013. [Arxiv], [Torch], Overfeat

2016 CVPR

You only look once: Unified, real-time object detection, J. Redmon, S. Divvala, R. Girshick, A. Farhadi, [OpenAccess], [DarkNet], YOLO

2016 ECCV

SSD: Single shot multibox detector, W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, A. C. Berg, [OpenAccess], [Caffe], SSD

2017 CVPR

Yolo9000: better, faster, stronger, J. Redmon, A. Farhadi, [OpenAccess], [DarkNet], YOLOv2
Ron: Reverse connection with objectness prior networks for object detection, T. Kong, F. Sun, A. Yao, H. Liu, M. Lu, Y. Chen, [OpenAccess], [Caffe], RON

2017 ICCV

Focal loss for dense object detection, T.Y. Lin, P. Goyal, R. Girshick, K. He, P. Dollar, [OpenAccess], [Caffe2], RetinaNet
Dsod: Learning deeply supervised object detectors from scratch, Z. Shen, Z. Liu, J. Li, Y.-G. Jiang, Y. Chen, X. Xue, [OpenAccess], [Caffe], DSOD

2017 Others

Dssd: Deconvolutional single shot detector, C.-Y. Fu, W. Liu, A. Ranga, A. Tyagi, A. C. Berg, in: arXiv preprint arXiv:1701.06659, 2017. [OpenAccess], [Caffe], DSSD
Residual features and unified prediction network for single stage detection, K. Lee, J. Choi, J. Jeong, N. Kwak, in: arXiv preprint arXiv:1707.05031, 2017. [OpenAccess]
Enhancement of ssd by concatenating feature maps for object detection, J. Jeong, H. Park, N. Kwak, in: arXiv preprint arXiv:1705.09587, 2017. [OpenAccess]
Fssd: Feature fusion single shot multibox detector, Z. Li, F. Zhou, in: arXiv preprint arXiv:1705.1712.00960, 2017. [OpenAccess], FSSD
Learning object detectors from scratch with gated recurrent feature pyramids, Z. Shen, H. Shi, R. Feris, L. Cao, S. Yan, D. Liu, X. Wang, X. Xue, T. S. Huang, in: arXiv preprint arXiv:1712.00886, 2017. [OpenAccess], [Caffe]

2018 CVPR

Single-shot refinement neural network for object detection, S. Zhang, L. Wen, X. Bian, Z. Lei, S. Z. Li, [OpenAccess], [Caffe], RefineDet
Scale-transferrable object detection, P. Zhou, B. Ni, C. Geng, J. Hu, Y. Xu, [OpenAccess], [Pytorch], STDN
Single-shot object detection with enriched semantics, Z. Zhang, S. Qiao, C. Xie, W. Shen, B. Wang, A. L. Yuille, [OpenAccess], [Caffe], DES

2018 ECCV

Cornernet: Detecting objects as paired keypoints, H. Law, J. Deng, [OpenAccess], [Pytorch], CornerNet
Receptive field block net for accurate and fast object detection, S. Liu, D. Huang, Y. Wang, [OpenAccess], [Pytorch], RFBNet
Deep feature pyramid reconfiguration for object detection, T. Kong, F. Sun, W. Huang, H. Liu, [OpenAccess]

2018 Others

YOLOv3: An Incremental Improvement, J. Redmon, A. Farhadi, in: arXiv preprint arXiv:1804.02767, 2018. [OpenAccess], [DarkNet], YOLOv3
Mdssd: Multi-scale deconvolutional single shot detector for small objects, M. Xu, L. Cui, P. Lv, X. Jiang, J. Niu, B. Zhou, M. Wang, in: arXiv preprint arXiv:1805.07009, 2018. [Arxiv], MDSSD

2019 AAAI

M2det: A single-shot object detector based on multi-level feature pyramid network, Q. Zhao, T. Sheng, Y. Wang, Z. Tang, Y. Chen, L. Cai, H. Ling, [OpenAccess], [Pytorch], M2Det
Gradient harmonized single-stage detector, Y. L. Buyu Li, X. Wang, [OpenAccess], [mmdetection ], GHM

2019 CVPR

Feature selective anchor-free module for single-shot object detection, C. Zhu, Y. He, M. Savvides, [OpenAccess], FSFA
Scratchdet: Exploring to train single-shot object detectors from scratch, R. Zhu, S. Zhang, X. Wang, L. Wen, H. Shi, L. Bo, T. Mei, [OpenAccess], [Caffe], Scratchdet
Bottom-up object detection by grouping extreme and center points, X. Zhou, J. Zhuo, P. Krahenbuhl, [OpenAccess], [Pytorch], ExtremeNet
Towards Accurate One-Stage Object Detection with AP-Loss, K. Chen, J. Li, W. Lin, J. See, J. Wang, L. Duan, Z. Chen, C. He, J. Zou, [OpenAccess], AP-Loss

2019 ICCV

Fcos: Fully convolutional one-stage object detection, Z. Tian, C. Shen, H. Chen, T. He, [OpenAccess], [Pytorch], FCOS
RepPoints: Point Set Representation for Object Detection, Z. Yang, S. Liu, H. Hu, L. Wang, S. Lin, [OpenAccess], RepPoints

2019 Others

Objects as points, X. Zhou, D. Wang, P. Krahenb ¨ uhl, in: arXiv preprint arXiv:1904.07850, 2019, [Arxiv], [Pytorch], CenterNet
Centernet: Keypoint triplets for object detection, K. Duan, S. Bai, L. Xie, H. Qi, Q. Huang, Q. Tian, in: arXiv preprint arXiv:1904.08189, 2019, [Arxiv], [Pytorch], CenterNet
CornerNet-Lite: Efficient Keypoint Based Object Detection, Hei Law, Yun Teng, Olga Russakovsky, Jia Deng, in: arXiv preprint arXiv:1904.08900, 2019, [OpenAccess], [Pytorch], CornerNet-Lite
Revisiting Feature Alignment for One-stage Object Detection, Y. Chen, C. Han, N. Wang, Z. Zhang, in: arXiv preprint arXiv:1908.01570, 2019, [OpenAccess], AlignDet
PosNeg-Balanced Anchors with Aligned Features for Single-Shot Object Detection, Qiankun Tang, Shice Liu, Jie Li, Yu Hu, in: arXiv preprint arXiv:1908.03295, 2019, [OpenAccess], [Pytorch], PADet
Cascade RetinaNet: Maintaining Consistency for Single-Stage Object Detection, Q. Tang, S. Liu, J. Li, Y. Hu, in: BMVC, 2019, [OpenAccess], CaRetinaNet

2. Face Detection

Joint face detection and alignment using multi-task cascaded convolutional networks, K. Zhang, Z. Zhang, Z. Li, Y. Qiao， in: IEEE Signal Processing Letters, 2016. [OpenAccess], [Caffe], MTCNN
Detecting faces using region-based fully convolutional networks, Y. Wang, X. Ji, Z. Zhou, H. Wang, Z. Li, in: arXiv preprint arXiv:1709.05256, 2017. [OpenAccess], Face R-FCN
Detecting faces using inside cascaded contextual cnn, K. Zhang, Z. Zhang, H. Wang, Z. Li, Y. Qiao, W. Liu, in: ICCV, 2017. [OpenAccess]
Cms-rcnn: Contextual multiscale region-based cnn for unconstrained face detection, C. Zhu, Y. Zheng, K. Luu, M. Savvides, in: Deep Learning for Biometrics, 2017. [OpenAccess], CMS-RCNN
Face r-cnn, H. Wang, Z. Li, X. Ji, Y. Wang, in: arXiv preprint arXiv:1706.01061, 2017. [OpenAccess], Face R-CNN
Scale-aware face detection, Z. Hao, Y. Liu, H. Qin, J. Yan, X. Li, X. Hu, in: CVPR, 2017. [OpenAccess]
Ssh: Single stage headless face detector, M. Najibi, P. Samangouei, R. Chellappa, L. Davis, in: ICCV, 2017. [OpenAccess], [Caffe], SSH
Feature agglomeration networks for single stage face detection, J. Zhang, X. Wu, J. Zhu, S. C. Hoi, in: arXiv preprint arXiv:1712.00721, 2017. [OpenAccess], FANet
Finding tiny faces, P. Hu, D. Ramanan, [OpenAccess], [MatConvNet], S3FD
S3fd: Single shot scale-invariant face detector, S. Zhang, X. Zhu, Z. Lei, H. Shi, X. Wang, S. Z. Li, [OpenAccess], [Caffe], S3FD
Recurrent scale approximation for object detection in cnn, Y. Liu, H. Li, J. Yan, F. Wei, X. Wang, X. Tang, [OpenAccess], [Caffe], RSA
Anchor cascade for efficient face detection, B. Yu, D. Tao, in: arXiv preprint arXiv:1805.03363, 2018. [OpenAccess]
Face detection using improved faster rcnn, C. Zhang, X. Xu, D. Tu, in: arXiv preprint arXiv:1802.02142, 2018. [OpenAccess], [Caffe]
Face-magnet: Magnifying feature maps to detect small faces, P. Samangouei, M. Najibi, L. Davis, R. Chellappa, in: arXiv preprint arXiv:1803.05258, 2018. [OpenAccess], [Caffe]
Selective refinement network for high performance face detection, C. Chi, S. Zhang, J. Xing, Z. Lei, S. Z. Li, X. Zou, in: arXiv preprint arXiv:1809.02693, 2018. [OpenAccess], [Pytorch], SRN
Pyramidbox: A context-assisted single shot face detector, X. Tang, D. K. Du, Z. He, J. Liu, in: ECCV, 2018. [OpenAccess], [TensorFlow]
Face detection using deep learning: An improved faster rcnn approach, X. Sun, P. Wu, S. C. Hoi, in: Neurocomputing, 2018. [OpenAccess]
Seeing small faces from robust anchors perspective, C. Zhu, R. Tao, K. Luu, M. Savvides, [OpenAccess]
Dsfd: Dual shot face detector, J. Li, Y. Wang, C. Wang, Y. Tai, J. Qian, J. Yang, C. Wang, J. Li, F. Huang, in: CVPR, 2019. [OpenAccess], [Pytorch], DSFD

3. Pedestrian Detection

Bilattice-based logical reasoning for human detection, V. D. Shet, J. Neumann, V. Ramesh, L. S. Davis, in: CVPR, 2007. [OpenAccess]
Integral channel features, P. Dollar, Z. Tu, P. Perona, S. Belongie, in: BMVC, 2009. [OpenAccess], [Project], ICF
A structural filter approach to human detection, G. Duan, H. Ai, S. Lao, in: ECCV, 2010. [OpenAccess]
Multi-cue pedestrian classification with partial occlusion handling, M. Enzweiler, A. Eigenstetter, B. Schiele, D. M. Gavrila, in: CVPR, 2010. [OpenAccess]
A discriminative deep model for pedestrian detection with occlusion handling, W. Ouyang, X. Wang, in: CVPR, 2012. [OpenAccess]
Modeling mutual visibility relationship in pedestrian detection, W. Ouyang, X. Zeng, X. Wang, in: CVPR, 2013. [OpenAccess]
Single-pedestrian detection aided by multi-pedestrian detection, W. Ouyang, X. Wang, in: CVPR, 2013. [OpenAccess]
Pedestrian detection with unsupervised multi-stage feature learning, P. Sermanet, K. Kavukcuoglu, S. Chintala, Y. LeCun, in: CVPR, 2013. [OpenAccess]
Joint deep learning for pedestrian detection, W. Ouyang, X. Wang, in: ICCV, 2013. [OpenAccess]
Handling occlusions with franken-classifiers, M. Mathias, R. Benenson, R. Timofte, L. Van Gool, in: ICCV, 2013. [OpenAccess]
Ten years of pedestrian detection, what have we learned?, R. Benenson, M. Omran, J. Hosang, B. Schiele, in: ECCV, 2014. [OpenAccess]
Detection and tracking of occluded people, S. Tang, M. Andriluka, B. Schiele, in: IJCV, 2014. [OpenAccess]
Learning complexity-aware cascades for deep pedestrian detection, Z. Cai, M. Saberian, N. Vasconcelos, in: ICCV, 2015. [OpenAccess]
Taking a deeper look at pedestrians, J. Hosang, M. Omran, R. Benenson, B. Schiele, [OpenAccess]
Deep learning strong parts for pedestrian detection, Y. Tian, P. Luo, X. Wang, X. Tang, in: CVPR, 2015. [OpenAccess]
A unified multi-scale deep convolutional neural network for fast object detection, Z. Cai, Q. Fan, R. S. Feris, N. Vasconcelos, [OpenAccess], [Caffe], MSCNN
Dave: A unified framework for fast vehicle detection and annotation, Y. Zhou, L. Liu, L. Shao, M. Mellor, in: ECCV, 2016. [OpenAccess]
Is faster r-cnn doing well for pedestrian detection?, L. Zhang, L. Lin, X. Liang, K. He, in: ECCV, 2016. [OpenAccess], [Caffe]
Exploit all the layers: Fast and accurate cnn object detector with scale dependent pooling and cascaded rejection classifiers, F. Yang, W. Choi, Y. Lin, [ OpenAccess], SDP-CRC
Accurate single stage detector using recurrent rolling convolution, J. Ren, X. Chen, J. Liu, W. Sun, J. Pang, Q. Yan, Y.-W. Tai, L. Xu, [OpenAccess], [Caffe], RRC
What can help pedestrian detection?, J. Mao, T. Xiao, Y. Jiang, Z. Cao, in: CVPR, 2017. [OpenAccess]
Learning cross-modal deep representations for robust pedestrian detection, D. Xu, W. Ouyang, E. Ricci, X. Wang, N. Sebe, in: CVPR, 2017. [OpenAccess], [Caffe], CMT-CNN
Repulsion loss: Detecting pedestrians in a crowd, X. Wang, T. Xiao, Y. Jiang, S. Shao, J. Sun, C. Shen, in: CVPR, 2018. [OpenAccess], [Pytorch]
Bi-box regression for pedestrian detection and occlusion estimation, C. Zhou, J. Yuan, in: ECCV, 2018. [OpenAccess]
Occlusion-aware r-cnn: Detecting pedestrians in a crowd, S. Zhang, L. Wen, X. Bian, Z. Lei, S. Z. Li, in: ECCV, 2018. [OpenAccess], OR R-CNN
Scale-aware fast r-cnn for pedestrian detection, J. Li, X. Liang, S. Shen, T. Xu, J. Feng, S. Yan, [Arxiv], in: TMM, 2018. SAF R-CNN
Pcn: Part and context information for pedestrian detection with cnns, S. Wang, J. Cheng, H. Liu, M. Tang, in: arXiv preprint arXiv:1804.04483, 2018. [OpenAccess]

4 Benchmarks

4.1 Generic Detection Datasets

Pascal VOC: The pascal visual object classes (voc) challenge, M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, A. Zisserman, [OpenAccess], [Project]
ImageNet: Imagenet: A large-scale hierarchical image database, * J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, L. Fei-Fei*, [OpenAccess], [Project]
MSCOCO: Microsoft COCO: Common Objects in Context, T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollar, C. L. Zitnick, [OpenAccess], [Project]
Open Images: The open images dataset v4: Unified image classification, object detection, and visual relationship detection at scale, A. Kuznetsova, H. Rom, N. Alldrin, J. Uijlings, I. Krasin, J. Pont-Tuset, S. Kamali, S. Popov, M. Malloci, T. Duerig, et al., [OpenAccess], [Project]
LVIS: Lvis: A dataset for large vocabulary instance segmentation, A. Gupta, P. Dollar, R. Girshick, [OpenAccess], [Project]

4.2 Face Detection Datasets

WIDER FACE: Wider face: A face detection benchmark, S. Yang, P. Luo, C.-C. Loy, X. Tang, [OpenAccess], [Project]
FDDB: Fddb: A benchmark for face detection in unconstrained settings, V. Jain, E. Learned-Miller, [OpenAccess], [Project]
PASCAL FACE: The pascal visual object classes (voc) challenge, M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, A. Zisserman, [OpenAccess], [Project]
MALF: Automatic Face and Gesture Recognition (FG), Yang, Bin and Yan, Junjie and Lei, Zhen and Li, Stan Z, [OpenAccess], [Project]
AFW: Face detection, pose estimation and landmark localization in the wild, X. Zhu, D. Ramanan, [OpenAccess], [Project]

4.3 Pedestrian Detection Datasets

CityPersons: Citypersons: A diverse dataset for pedestrian detection, S. Zhang, R. Benenson, B. Schiele, [OpenAccess], [Project]
Caltech: Pedestrian detection: An evaluation of the state of the art, P. Dollar, C. Wojek, B. Schiele, P. Perona, [OpenAccess], [Project]
ETH: Depth and appearance for mobile scene analysis, A. Ess, B. Leibe, L. Van Gool, [OpenAccess], [Project]
INRIA: Histograms of oriented gradients for human detection, N. Dalal, B. Triggs, [OpenAccess], [Project]
KITTI: Vision meets robotics: The kitti dataset, A. Geiger, P. Lenz, C. Stiller, R. Urtasun, [OpenAccess], [Project]

5. SOTA

5.1 Pascal VOC

Method	Backbone	Proposed Year	Input size(Test)	VOC2007	VOC2012
Two-stage
R-CNN	VGG-16	2014	Arbitrary	66.0∗	62.4†
SPP-net	VGG-16	2014	~600 × 1000	63.1∗	-
Fast R-CNN	VGG-16	2015	~600 × 1000	70.0	68.4
Faster R-CNN	VGG-16	2015	~600 × 1000	73.2	70.4
MR-CNN	VGG-16	2015	Multi-Scale	78.2	73.9
Faster R-CNN	ResNet-101	2016	~600 × 1000	76.4	73.8
R-FCN	ResNet-101	2016	~600 × 1000	80.5	77.6
OHEM	VGG-16	2016	~600 × 1000	74.6	71.9
HyperNet	VGG-16	2016	~600 × 1000	76.3	71.4
ION	VGG-16	2016	~600 × 1000	79.2	76.4
CRAFT	VGG-16	2016	~600 × 1000	75.7	71.3†
LocNet	VGG-16	2016	~600 × 1000	78.4	74.8†
R-FCN w DCN	ResNet-101	2017	~600 × 1000	82.6	-
CoupleNet	ResNet-101	2017	~600 × 1000	82.7	80.4
DeNet512(wide)	ResNet-101	2017	~512 × 512	77.1	73.9
FPN-Reconfig	ResNet-101	2018	~600 × 1000	82.4	81.1
DeepRegionLet	ResNet-101	2018	~600 × 1000	83.3	81.3
DCN+R-CNN	ResNet-101+ResNet-152	2018	Arbitrary	84.0	81.2
One-stage
YOLOv1	VGG16	2016	448 × 448	66.4	57.9
SSD512	VGG-16	2016	512 × 512	79.8	78.5
YOLOv2	Darknet	2017	544 × 544	78.6	73.5
DSSD513	ResNet-101	2017	513 × 513	81.5	80.0
DSOD300	DS/64-192-48-1	2017	300 × 300	77.7	76.3
RON384	VGG-16	2017	384 × 384	75.4	73.0
STDN513	DenseNet-169	2018	513 × 513	80.9	-
RefineDet512	VGG-16	2018	512 × 512	81.8	80.1
RFBNet512	VGG16	2018	512 × 512	82.2	-
CenterNet	ResNet101	2019	512 × 512	78.7	-
CenterNet	DLA	2019	512 × 512	80.7	-

∗: This entry reports the the model is trained with VOC2007 trainval sets only. †: This entry reports the the model are trained with VOC2012 trainval sets only .

5.2 MSCOCO

Method	Backbone	Year	AP	AP$_{50}$	AP$_{75}$	AP$_{S}$	AP$_{M}$	AP$_{L}$
Two-stage
Fast R-CNN	VGG-16	2015	19.7	35.9	-	-	-	-
Faster R-CNN	VGG-16	2015	21.9	42.7	-	-	-	-
OHEM	VGG-16	2016	22.6	42.5	22.2	5.0	23.7	37.9
ION	VGG-16	2016	23.6	43.2	23.6	6.4	24.1	38.3
OHEM++	VGG-16	2016	25.5	45.9	26.1	7.4	27.7	40.3
R-FCN	ResNet-101	2016	29.9	51.9	-	10.8	32.8	45.0
Faster R-CNN+++	ResNet-101	2016	34.9	55.7	37.4	15.6	38.7	50.9
Faster R-CNN w FPN	ResNet-101	2016	36.2	59.1	39.0	18.2	39.0	48.2
DeNet-101(wide)	ResNet-101	2017	33.8	53.4	36.1	12.3	36.1	50.8
CoupleNet	ResNet-101	2017	34.4	54.8	37.2	13.4	38.1	50.8
Faster R-CNN by G-RMI	Inception-ResNet-v2	2017	34.7	55.5	36.7	13.5	38.1	52.0
Deformable R-FCN	Aligned-Inception-ResNet	2017	37.5	58.0	40.8	19.4	40.1	52.5
Mask-RCNN	ResNeXt-101	2017	39.8	62.3	43.4	22.1	43.2	51.2
umd det	ResNet-101	2017	40.8	62.4	44.9	23.0	43.4	53.2
Fitness-NMS	ResNet-101	2017	41.8	60.9	44.9	21.5	45.0	57.5
DCN w Relation Net	ResNet-101	2018	39.0	58.6	42.9	-	-	-
DeepRegionlets	ResNet-101	2018	39.3	59.8	-	21.7	43.7	50.9
C-Mask RCNN	ResNet-101	2018	42.0	62.9	46.4	23.4	44.7	53.8
Group Norm	ResNet-101	2018	42.3	62.8	46.2	-	-	-
DCN+R-CNN	ResNet-101+ResNet-152	2018	42.6	65.3	46.5	26.4	46.1	56.4
Cascade R-CNN	ResNet-101	2018	42.8	62.1	46.3	23.7	45.5	55.2
SNIP++	DPN-98	2018	45.7	67.3	51.1	29.3	48.8	57.1
SNIPER++	ResNet-101	2018	46.1	67.0	51.6	29.6	48.9	58.1
PANet++	ResNeXt-101	2018	47.4	67.2	51.8	30.1	51.7	60.0
Grid R-CNN	ResNeXt-101	2019	43.2	63.0	46.6	25.1	46.5	55.2
DCN-v2	ResNet-101	2019	44.8	66.3	48.8	24.4	48.1	59.6
DCN-v2++	ResNet-101	2019	46.0	67.9	50.8	27.8	49.1	59.5
TridentNet	ResNet-101	2019	42.7	63.6	46.5	23.9	46.6	56.6
TridentNet	ResNet-101-Deformable	2019	48.4	69.7	53.5	31.8	51.3	60.3
One-stage
SSD512	VGG-16	2016	28.8	48.5	30.3	10.9	31.8	43.5
RON384++	VGG-16	2017	27.4	49.5	27.1	-	-	-
YOLOv2	DarkNet-19	2017	21.6	44.0	19.2	5.0	22.4	35.5
SSD513	ResNet-101	2017	31.2	50.4	33.3	10.2	34.5	49.8
DSSD513	ResNet-101	2017	33.2	53.3	35.2	13.0	35.4	51.1
RetinaNet800++	ResNet-101	2017	39.1	59.1	42.3	21.8	42.7	50.2
STDN513	DenseNet-169	2018	31.8	51.0	33.6	14.4	36.1	43.4
FPN-Reconfig	ResNet-101	2018	34.6	54.3	37.3	-	-	-
RefineDet512	ResNet-101	2018	36.4	57.5	39.5	16.6	39.9	51.4
RefineDet512++	ResNet-101	2018	41.8	62.9	45.7	25.6	45.1	54.1
GHM SSD	ResNeXt-101	2018	41.6	62.8	44.2	22.3	45.1	55.3
CornerNet511	Hourglass-104	2018	40.5	56.5	43.1	19.4	42.7	53.9
CornerNet511++	Hourglass-104	2018	42.1	57.8	45.3	20.8	44.8	56.7
M2Det800	VGG-16	2019	41.0	59.7	45.0	22.1	46.5	53.8
M2Det800++	VGG-16	2019	44.2	64.6	49.3	29.2	47.9	55.1
ExtremeNet	Hourglass-104	2019	40.2	55.5	43.2	20.4	43.2	53.1
CenterNet-HG	Hourglass-104	2019	42.1	61.1	45.9	24.1	45.5	52.8
FCOS	ResNeXt-101	2019	42.1	62.1	45.2	25.6	44.9	52.0
FSAF	ResNeXt-101	2019	42.9	63.8	46.3	26.6	46.2	52.7
CenterNet511	Hourglass-104	2019	44.9	62.4	48.1	25.6	47.4	57.4
CenterNet511++	Hourglass-104	2019	47.0	64.5	50.7	28.9	49.9	58.9

6. Emerging Ideas

6.1 Anchor Design

6.1.1 Anchor-Free Methods

Denet: Scalable real-time object detection with directed sparse sampling, L. Tychsen-Smith, L. Petersson, in: ICCV, 2017. [OpenAccess],[Theano], DeNet
Cornernet: Detecting objects as paired keypoints, H. Law, J. Deng, in: ECCV, 2018. [OpenAccess], [Pytorch], CornerNet
Objects as points, X. Zhou, D. Wang, P. Krahenb ¨ uhl, [Arxiv], [Pytorch], in: arXiv preprint arXiv:1904.07850, 2019. CenterNet
Centernet: Keypoint triplets for object detection, K. Duan, S. Bai, L. Xie, H. Qi, Q. Huang, Q. Tian, in: arXiv preprint arXiv:1904.08189, 2019. [Arxiv], [Pytorch], CenterNet
Bottom-up object detection by grouping extreme and center points, X. Zhou, J. Zhuo, P. Krahenbuhl, in: CVPR, 2019. [OpenAccess], [Pytorch], ExtremeNet
Feature selective anchor-free module for single-shot object detection, C. Zhu, Y. He, M. Savvides, in: CVPR, 2019. [OpenAccess], FSFA
Fcos: Fully convolutional one-stage object detection, Z. Tian, C. Shen, H. Chen, T. He, in: ICCV, 2019. [OpenAccess], [Pytorch], FCOS
CornerNet-Lite: Efficient Keypoint Based Object Detection, Hei Law, Yun Teng, Olga Russakovsky, Jia Deng, in: arXiv preprint arXiv:1904.08900, 2019. [OpenAccess], [Pytorch], CornerNet-Lite
RepPoints: Point Set Representation for Object Detection, Z. Yang, S. Liu, H. Hu, L. Wang, S. Lin, in: ICCV, 2019. [OpenAccess], RepPoints

6.1.2 Anchor-Refinement Methods

Yolo9000: better, faster, stronger, J. Redmon, A. Farhadi, [OpenAccess], [DarkNet], in: CVPR, 2017. YOLOv2
Cascade r-cnn: Delving into high quality object detection, Z. Cai, N. Vasconcelos, in: CVPR, 2018. [OpenAccess], [Caffe], [Caffe2] Cascade R-CNN
Single-shot refinement neural network for object detection, S. Zhang, L. Wen, X. Bian, Z. Lei, S. Z. Li, in: CVPR, 2018. [OpenAccess], [Caffe], RefineDet
Metaanchor: Learning to detect objects with customized anchors, T. Yang, X. Zhang, Z. Li, W. Zhang, J. Sun, in: NeurIPS, 2018. [OpenAccess], MetaAnchor
Derpn: Taking a further step toward more general object detection, L. J. Z. X. Lele Xie, Yuliang Liu, in: AAAI, 2019. [OpenAccess], [Caffe], DeRPN
Region Proposal by Guided Anchoring, J. Wang, K. Chen, S. Yang, C. C. Loy, D. Lin, [OpenAccess], [mmdetection]
Revisiting Feature Alignment for One-stage Object Detection, Y. Chen, C. Han, N. Wang, Z. Zhang, in: arXiv preprint arXiv:1908.01570, 2019, [OpenAccess], AlignDet
PosNeg-Balanced Anchors with Aligned Features for Single-Shot Object Detection, Qiankun Tang, Shice Liu, Jie Li, Yu Hu, in: arXiv preprint arXiv:1908.03295, 2019, [OpenAccess], [Pytorch], PADet
Cascade RetinaNet: Maintaining Consistency for Single-Stage Object Detection, Q. Tang, S. Liu, J. Li, Y. Hu, in: BMVC, 2019, [OpenAccess], CaRetinaNet

6.2 AutoML Detection

Nas-fpn: Learning scalable feature pyramid architecture for object detection, G. Ghiasi, T.-Y. Lin, Q. V. Le, [OpenAccess], [TensorFlow], NAS-FPN
Detnas: Neural architecture search on object detection, Y. Chen, T. Yang, X. Zhang, G. Meng, C. Pan, J. Sun, in: arXiv preprint arXiv:1903.10979, 2019. [OpenAccess], DetNas
Learning data augmentation strategy, B. Zoph, E. D. Cubuk, G. Ghiasi, T.-Y. Lin, J. Shlens, Q. V. Le, in: arXiv preprint arXiv:1906.11172, 2019. [OpenAccess], [TensorFlow]
AutoAugment: Learning Augmentation Strategies from Data, E. D. Cubuk, B. Zoph, D. Mane, V. Vasudevan, Q. V. Le, in: CVPR, 2019. [OpenAccess], AutoAugment

6.3 Low-shot Detection

Few-example object detection with model communication, X. Dong, L. Zheng, F. Ma, Y. Yang, D. Meng, in: TPAMI, 2018. [OpenAccess], [Project], MSPLD
Lstd: A low-shot transfer detector for object detection, H. Chen, Y. Wang, G. Wang, Y. Qiao, in: AAAI, 2018. [OpenAccess], [Caffe], LSTD
Repmet: Representative-based metric learning for classification and one-shot object detection, E. Schwartz, L. Karlinsky, J. Shtok, S. Harary, M. Marder, S. Pankanti, R. Feris, A. Kumar, R. Giries, A. M. Bronstein, in: CVPR, 2019. [OpenAccess], [Pytorch], RepMet

6.4 Others

Megdet: A large mini-batch object detector, C. Peng, T. Xiao, Z. Li, Y. Jiang, X. Zhang, K. Jia, G. Yu, J. Sun, in: CVPR, 2018 [OpenAccess], Megdet
Incremental learning of object detectors without catastrophic forgetting, *K. Shmelkov, C. Schmid, K. Alahari, *, in: ICCV, 2017. [OpenAccess], [TensorFlow]