Last Update: 2019/08/16
TODO:
- Fix the link and format issues
- Add paper link to SOTA tables
A list of awesome object detection resources.
Recently we released survey (Recent Advances in Deep Learning for Object Detection) to the community. In this survey, we systematically analyze the existing object detection frameworks and organize the survey into three major parts: (i) detection components, (ii) learning strategies, and (iii) applications & benchmarks. In the survey, we cover a variety of factors affecting the detection performance in detail, such as detector architectures, feature learning, proposal generation, sampling strategies, etc. Finally, we discuss several future directions to facilitate and spur future research for visual object detection with deep learning.
After completing this survey, we decided to release the collected resource of object detection. We will keep updating our survey as well as this resource collection, since this area moves too fast. If you have any questions or suggestions, please feel free to contact us.
Table of Contents
- 1. Generic Object Detection
- 2. Face Detection
- 3. Pedestrian Detection
- 4. Benchmarks
- 5. SOTA
- 6. Emerging Ideas
- 7. Other Resources
Citing this work
If this repository is useful, please cite our survey.
@article{wu2019recent,
title={Recent Advances in Deep Learning for Object Detection},
author={Xiongwei Wu, Doyen Sahoo, Steven C.H. Hoi},
journal={arXiv preprint arXiv:1908.03673},
year={2019}
}
2014 CVPR
- Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation, R. Girshick, J. Donahue, T. Darrell, J. Malik, [OpenAccess],[Supplementary], [Caffe],
RCNN
2014 ECCV
- Spatial pyramid pooling in deep convolutional networks for visual recognition, K. He, X. Zhang, S. Ren, J. Sun, [Arxiv], [Caffe-Matlab],
SPP-Net
2015 CVPR
- Deepid-net: Deformable deep convolutional neural networks for object detection,W. Ouyang, X. Wang, X. Zeng, S. Qiu, P. Luo, Y. Tian, H. Li, S. Yang, Z. Wang, C.-C. Loy, [OpenAccess]
- segdeepm: Exploiting segmentation and context in deep neural networks for object detection, Y. Zhu, R. Urtasun, R. Salakhutdinov, S. Fidler, [OpenAccess]
- Deformable part models are convolutional neural networks, R. Girshick, F. Iandola, T. Darrell, J. Malik, [OpenAccess]
2015 ICCV
- Fast r-cnn, R. Girshick, [OpenAccess], [Caffe-Python],
Fast R-CNN
- Object detection via a multi-region and semantic segmentation-aware cnn model, S. Gidaris, N. Komodakis, [OpenAccess], [Caffe],
MR-CNN
- Deepproposal: Hunting objects by cascading deep convolutional layers, A. Ghodrati, A. Diba, M. Pedersoli, T. Tuytelaars, L. Van Gool, [OpenAccess], [MatConvnet],
Deepproposal
2015 NeurIPS
- Faster r-cnn: Towards real-time object detection with region proposal networks, S. Ren, K. He, R. Girshick, J. Sun, [OpenAccess],[Arxiv],[Caffe-Matlab], [Caffe-Python],[Pytorch], [TensorFlow], [MXNet],
Faster R-CNN
2016 CVPR
- Hypernet: Towards accurate region proposal generation and joint object detection, T. Kong, A. Yao, Y. Chen, F. Sun, [OpenAccess],
HyperNet
- Inside-outside net: Detecting objects in context with skip pooling and recurrent neural networks, S. Bell, C. Lawrence Zitnick, K. Bala, R. Girshick, [OpenAccess],
ION
- Object detection from video tubelets with convolutional neural networks, K. Kang, W. Ouyang, H. Li, X. Wang, [OpenAccess], [Caffe],
T-CNN
- Instance-aware semantic segmentation via multitask network cascades, J. Dai, K. He, J. Sun, [OpenAccess], [Caffe],
MNC
- Adaptive object detection using adjacency and zoom prediction, Y. Lu, T. Javidi, S. Lazebnik, [Arxiv], [Caffe],
AZ-Net
- Training region-based object detectors with online hard example mining, A. Shrivastava, A. Gupta, R. Girshick, [OpenAccess], [Caffe],
OHEM
- Locnet: Improving localization accuracy for object detection, S. Gidaris, N. Komodakis, [OpenAccess], [Matlab],
LocNet
- Craft objects from images, B. Yang, J. Yan, Z. Lei, S. Z. Li, [OpenAccess], [Caffe],
CRAFT
2016 ECCV
- Contextual priming and feedback for faster r-cnn, A. Shrivastava, A. Gupta, [OpenAccess]
- Gated bi-directional cnn for object detection, X. Zeng, W. Ouyang, B. Yang, J. Yan, X. Wang, [OpenAccess]
2016 NeurIPS
- R-fcn: Object detection via region-based fully convolutional networks, J. Dai, Y. Li, K. He, J. Sun, [OpenAccess], [Caffe-Matlab], [Caffe-Python],
R-FCN
2016 Others
- Beyond skip connections: Top-down modulation for object detection, A. Shrivastava, R. Sukthankar, J. Malik, A. Gupta, in: arXiv preprint arXiv:1612.06851, 2016. [Arxiv],
TDM
- A multipath network for object detection, S. Zagoruyko, A. Lerer, T.-Y. Lin, P. O. Pinheiro, S. Gross, S. Chintala, P. Dollar, in: BMVC, 2016. [Arxiv], [Torch],
MultiPathNet
- Pvanet: deep but lightweight neural networks for real-time object detection, K.-H. Kim, S. Hong, B. Roh, Y. Cheon, M. Park, in: arXiv preprint arXiv:1608.08021, 2016. [Arxiv], [Caffe],
PVANet
2017 CVPR
- Feature pyramid networks for object detection, T.Y. Lin, P. Dollar, R. Girshick, K. He, B. Hariharan, S. Belongie, [OpenAccess], [Caffe2],
FPN
- Perceptual generative adversarial networks for small object detection, J. Li, X. Liang, Y. Wei, T. Xu, J. Feng, S. Yan, [OpenAccess],
PGAN
- A-fast-rcnn: Hard positive generation via adversary for object detection, X. Wang, A. Shrivastava, A. Gupta, [OpenAccess], Caffe],
A-Fast-RCNN
- Mimicking very efficient network for object detection, Q. Li, S. Jin, J. Yan, [OpenAccess]
- Learning non-maximum suppression, J. Hosang, R. Benenson, B. Schiele, [OpenAccess], [TensorFlow]
- Speed/accuracy trade-offs for modern convolutional object detectors, J. Huang, V. Rathod, C. Sun, M. Zhu, A. Korattikara, A. Fathi, I. Fischer, Z. Wojna, Y. Song, S. Guadarrama, et al., [OpenAccess], [TensorFlow]
2017 ICCV
- Mask R-CNN, K. He, G. Gkioxari, P. Dollar, R. Girshick, [OpenAccess],[Caffe2], [Slides],
Mask R-CNN
- Denet: Scalable real-time object detection with directed sparse sampling, L. Tychsen-Smith, L. Petersson, [OpenAccess],[Theano],
DeNet
- Deformable convolutional networks, J. Dai, H. Qi, Y. Xiong, Y. Li, G. Zhang, H. Hu, Y. Wei, [OpenAccess],[MXNet],
DCN
- Couplenet: Coupling global structure with local parts for object detection, Y. Zhu, C. Zhao, J. Wang, X. Zhao, Y. Wu, H. Lu, [OpenAccess],[Caffe],
CoupleNet
- Spatial memory for context reasoning in object detection, X. Chen, A. Gupta, [OpenAccess],
SMN
- Soft-nms – improving object detection with one line of code, N. Bodla, B. Singh, R. Chellappa, L. S. Davis, [OpenAccess], [Caffe]
2017 Others
- Light-head rcnn: In defense of two-stage object detector, Z. Li, C. Peng, G. Yu, X. Zhang, Y. Deng, J. Sun, in: arXiv preprint arXiv:1711.07264, 2017. [Arxiv], [Pytorch], [TensorFlow]
- Zoom out-and-in network with recursive training for object proposal, H. Li, Y. Liu, W. Ouyang, X. Wang, in: arXiv preprint arXiv:1702.05711, 2017. [Arxiv]
2018 CVPR
- Cascade r-cnn: Delving into high quality object detection, Z. Cai, N. Vasconcelos, [OpenAccess], [Caffe], [Caffe2]
Cascade R-CNN
- Detnet: A backbone network for object detection, Z. Li, C. Peng, G. Yu, X. Zhang, Y. Deng, J. Sun, [OpenAccess], [Pytorch*],
DetNet
- An analysis of scale invariance in object detection–snip, B. Singh, L. S. Davis, [OpenAccess], [MXNet],
SNIP
- Multi-scale location-aware kernel representation for object detection, H. Wang, Q. Wang, M. Gao, P. Li, W. Zuo, [OpenAccess], [Caffe],
MLKR
- Feature selective networks for object detection, Y. Zhai, J. Fu, Y. Lu, H. Li, [OpenAccess]
- Pseudo mask augmented object detection, X. Zhao, S. Liang, Y. Wei, [OpenAccess]
- Structure inference net: Object detection using scene-level context and instance-level relationships, Y. Liu, R. Wang, S. Shan, X. Chen, [OpenAccess], [TensorFlow],
SIN
- Relation networks for object detection, H. Hu, J. Gu, Z. Zhang, J. Dai, Y. Wei, [OpenAccess], [MXNet]
- Path Aggregation Network for Instance Segmentation, S. Liu, L. Qi, H. Qin, J. Shi and J. Jia, [OpenAccess], [Pytorch]
2018 ECCV
- Acquisition of localization confidence for accurate object detection, B. Jiang, R. Luo, J. Mao, T. Xiao, Y. Jiang, [OpenAccess], [Pytorch],
IoU-Net
- Revisiting rcnn: On awakening the classification power of faster rcnn, B. Cheng, Y. Wei, H. Shi, R. Feris, J. Xiong, T. Huang, [OpenAccess], [MXNet]
- Learning region features for object detection, J. Gu, H. Hu, L. Wang, Y. Wei, J. Dai, [OpenAccess]
- Deep regionlets for object detection, H. Xu, X. Lv, X. Wang, Z. Ren, R. Chellappa, [OpenAccess]
- Context refinement for object detection, Z. Chen, S. Huang, D. Tao, [OpenAccess]
2018 NeurIPS
- Metaanchor: Learning to detect objects with customized anchors, T. Yang, X. Zhang, Z. Li, W. Zhang, J. Sun, [OpenAccess],
MetaAnchor
- Sniper: Efficient multi-scale training, B. Singh, M. Najibi, L. S. Davis, [OpenAccess], [MXNet],
SNIPER
2019 AAAI
- Derpn: Taking a further step toward more general object detection, L. J. Z. X. Lele Xie, Yuliang Liu, [OpenAccess], [Caffe],
DeRPN
- Object Detection based on Region Decomposition and Assembly, S.-H Bae, [OpenAccess],
R-DAD
2019 CVPR
- Mask scoring r-cnn, Z. Huang, L. Huang, Y. Gong, C. Huang, X. Wang, [OpenAccess], [Pytorch],
Mask Scoring R-CNN
- Deformable convnets v2: More deformable, better results, S. L. Xizhou Zhu, Han Hu, J. Dai, [OpenAccess], [MXNet],
DCNv2
- Grid r-cnn, X. Lu, B. Li, Y. Yue, Q. Li, J. Yan, [OpenAccess], [mmdetection]
- Nas-fpn: Learning scalable feature pyramid architecture for object detection, G. Ghiasi, T.-Y. Lin, Q. V. Le, [OpenAccess], [TensorFlow],
NAS-FPN
- Bounding Box Regression with Uncertainty for Accurate Object Detection, Y. He, C. Zhu, J. Wang, M. Savvides, X. Zhang, [OpenAccess], [Caffe2],
KL-Loss
- Libra R-CNN: Towards Balanced Learning for Object Detection, J. Pang, K. Chen, J. Shi, H. Feng, W. Ouyang, D. Lin, [OpenAccess], [Pytorch], [mmdetection],
Libra R-CNN
- Region Proposal by Guided Anchoring, J. Wang, K. Chen, S. Yang, C. C. Loy, D. Lin, [OpenAccess], [mmdetection]
2019 ICCV
- Rethinking imagenet pre-training, R. G. Kaiming He, P. Dollro, [OpenAccess]
2019 Others
- Scale-aware trident networks for object detection, Y. Li, Y. Chen, N. Wang, Z. Zhang, in: arXiv preprint arXiv:1901.01892, 2019. [OpenAccess], [MXNet],
TridentNet
2019 NeurIPS
Before 2014
- Overfeat: Integrated recognition, localization and detection using convolutional networks, P. Sermanet, D. Eigen, X. Zhang, M. Mathieu, R. Fergus, Y. LeCun, in: arXiv preprint arXiv:1312.6229, 2013. [Arxiv], [Torch],
Overfeat
2016 CVPR
- You only look once: Unified, real-time object detection, J. Redmon, S. Divvala, R. Girshick, A. Farhadi, [OpenAccess], [DarkNet],
YOLO
2016 ECCV
- SSD: Single shot multibox detector, W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, A. C. Berg, [OpenAccess], [Caffe],
SSD
2017 CVPR
- Yolo9000: better, faster, stronger, J. Redmon, A. Farhadi, [OpenAccess], [DarkNet],
YOLOv2
- Ron: Reverse connection with objectness prior networks for object detection, T. Kong, F. Sun, A. Yao, H. Liu, M. Lu, Y. Chen, [OpenAccess], [Caffe],
RON
2017 ICCV
- Focal loss for dense object detection, T.Y. Lin, P. Goyal, R. Girshick, K. He, P. Dollar, [OpenAccess], [Caffe2],
RetinaNet
- Dsod: Learning deeply supervised object detectors from scratch, Z. Shen, Z. Liu, J. Li, Y.-G. Jiang, Y. Chen, X. Xue, [OpenAccess], [Caffe],
DSOD
2017 Others
- Dssd: Deconvolutional single shot detector, C.-Y. Fu, W. Liu, A. Ranga, A. Tyagi, A. C. Berg, in: arXiv preprint arXiv:1701.06659, 2017. [OpenAccess], [Caffe],
DSSD
- Residual features and unified prediction network for single stage detection, K. Lee, J. Choi, J. Jeong, N. Kwak, in: arXiv preprint arXiv:1707.05031, 2017. [OpenAccess]
- Enhancement of ssd by concatenating feature maps for object detection, J. Jeong, H. Park, N. Kwak, in: arXiv preprint arXiv:1705.09587, 2017. [OpenAccess]
- Fssd: Feature fusion single shot multibox detector, Z. Li, F. Zhou, in: arXiv preprint arXiv:1705.1712.00960, 2017. [OpenAccess],
FSSD
- Learning object detectors from scratch with gated recurrent feature pyramids, Z. Shen, H. Shi, R. Feris, L. Cao, S. Yan, D. Liu, X. Wang, X. Xue, T. S. Huang, in: arXiv preprint arXiv:1712.00886, 2017. [OpenAccess], [Caffe]
2018 CVPR
- Single-shot refinement neural network for object detection, S. Zhang, L. Wen, X. Bian, Z. Lei, S. Z. Li, [OpenAccess], [Caffe],
RefineDet
- Scale-transferrable object detection, P. Zhou, B. Ni, C. Geng, J. Hu, Y. Xu, [OpenAccess], [Pytorch],
STDN
- Single-shot object detection with enriched semantics, Z. Zhang, S. Qiao, C. Xie, W. Shen, B. Wang, A. L. Yuille, [OpenAccess], [Caffe],
DES
2018 ECCV
- Cornernet: Detecting objects as paired keypoints, H. Law, J. Deng, [OpenAccess], [Pytorch],
CornerNet
- Receptive field block net for accurate and fast object detection, S. Liu, D. Huang, Y. Wang, [OpenAccess], [Pytorch],
RFBNet
- Deep feature pyramid reconfiguration for object detection, T. Kong, F. Sun, W. Huang, H. Liu, [OpenAccess]
2018 Others
- YOLOv3: An Incremental Improvement, J. Redmon, A. Farhadi, in: arXiv preprint arXiv:1804.02767, 2018. [OpenAccess], [DarkNet],
YOLOv3
- Mdssd: Multi-scale deconvolutional single shot detector for small objects, M. Xu, L. Cui, P. Lv, X. Jiang, J. Niu, B. Zhou, M. Wang, in: arXiv preprint arXiv:1805.07009, 2018. [Arxiv],
MDSSD
2019 AAAI
- M2det: A single-shot object detector based on multi-level feature pyramid network, Q. Zhao, T. Sheng, Y. Wang, Z. Tang, Y. Chen, L. Cai, H. Ling, [OpenAccess], [Pytorch],
M2Det
- Gradient harmonized single-stage detector, Y. L. Buyu Li, X. Wang, [OpenAccess], [mmdetection ],
GHM
2019 CVPR
- Feature selective anchor-free module for single-shot object detection, C. Zhu, Y. He, M. Savvides, [OpenAccess],
FSFA
- Scratchdet: Exploring to train single-shot object detectors from scratch, R. Zhu, S. Zhang, X. Wang, L. Wen, H. Shi, L. Bo, T. Mei, [OpenAccess], [Caffe],
Scratchdet
- Bottom-up object detection by grouping extreme and center points, X. Zhou, J. Zhuo, P. Krahenbuhl, [OpenAccess], [Pytorch],
ExtremeNet
- Towards Accurate One-Stage Object Detection with AP-Loss,
K. Chen, J. Li, W. Lin, J. See, J. Wang, L. Duan, Z. Chen, C. He, J. Zou, [OpenAccess],
AP-Loss
2019 ICCV
- Fcos: Fully convolutional one-stage object detection, Z. Tian, C. Shen, H. Chen, T. He, [OpenAccess], [Pytorch],
FCOS
- RepPoints: Point Set Representation for Object Detection, Z. Yang, S. Liu, H. Hu, L. Wang, S. Lin, [OpenAccess],
RepPoints
2019 Others
- Objects as points, X. Zhou, D. Wang, P. Krahenb ¨ uhl, in: arXiv preprint arXiv:1904.07850, 2019, [Arxiv], [Pytorch],
CenterNet
- Centernet: Keypoint triplets for object detection, K. Duan, S. Bai, L. Xie, H. Qi, Q. Huang, Q. Tian, in: arXiv preprint arXiv:1904.08189, 2019, [Arxiv], [Pytorch],
CenterNet
- CornerNet-Lite: Efficient Keypoint Based Object Detection, Hei Law, Yun Teng, Olga Russakovsky, Jia Deng, in: arXiv preprint arXiv:1904.08900, 2019, [OpenAccess], [Pytorch],
CornerNet-Lite
- Revisiting Feature Alignment for One-stage Object Detection, Y. Chen, C. Han, N. Wang, Z. Zhang, in: arXiv preprint arXiv:1908.01570, 2019, [OpenAccess],
AlignDet
- PosNeg-Balanced Anchors with Aligned Features for Single-Shot Object Detection, Qiankun Tang, Shice Liu, Jie Li, Yu Hu, in: arXiv preprint arXiv:1908.03295, 2019, [OpenAccess], [Pytorch],
PADet
- Cascade RetinaNet: Maintaining Consistency for Single-Stage Object Detection, Q. Tang, S. Liu, J. Li, Y. Hu, in: BMVC, 2019, [OpenAccess],
CaRetinaNet
- Joint face detection and alignment using multi-task cascaded convolutional networks, K. Zhang, Z. Zhang, Z. Li, Y. Qiao, in: IEEE Signal Processing Letters, 2016. [OpenAccess], [Caffe],
MTCNN
- Detecting faces using region-based fully convolutional networks, Y. Wang, X. Ji, Z. Zhou, H. Wang, Z. Li, in: arXiv preprint arXiv:1709.05256, 2017. [OpenAccess],
Face R-FCN
- Detecting faces using inside cascaded contextual cnn, K. Zhang, Z. Zhang, H. Wang, Z. Li, Y. Qiao, W. Liu, in: ICCV, 2017. [OpenAccess]
- Cms-rcnn: Contextual multiscale region-based cnn for unconstrained face detection, C. Zhu, Y. Zheng, K. Luu, M. Savvides, in: Deep Learning for Biometrics, 2017. [OpenAccess],
CMS-RCNN
- Face r-cnn, H. Wang, Z. Li, X. Ji, Y. Wang, in: arXiv preprint arXiv:1706.01061, 2017. [OpenAccess],
Face R-CNN
- Scale-aware face detection, Z. Hao, Y. Liu, H. Qin, J. Yan, X. Li, X. Hu, in: CVPR, 2017. [OpenAccess]
- Ssh: Single stage headless face detector, M. Najibi, P. Samangouei, R. Chellappa, L. Davis, in: ICCV, 2017. [OpenAccess], [Caffe],
SSH
- Feature agglomeration networks for single stage face detection, J. Zhang, X. Wu, J. Zhu, S. C. Hoi, in: arXiv preprint arXiv:1712.00721, 2017. [OpenAccess],
FANet
- Finding tiny faces, P. Hu, D. Ramanan, [OpenAccess], [MatConvNet],
S3FD
- S3fd: Single shot scale-invariant face detector, S. Zhang, X. Zhu, Z. Lei, H. Shi, X. Wang, S. Z. Li, [OpenAccess], [Caffe],
S3FD
- Recurrent scale approximation for object detection in cnn, Y. Liu, H. Li, J. Yan, F. Wei, X. Wang, X. Tang, [OpenAccess], [Caffe],
RSA
- Anchor cascade for efficient face detection, B. Yu, D. Tao, in: arXiv preprint arXiv:1805.03363, 2018. [OpenAccess]
- Face detection using improved faster rcnn, C. Zhang, X. Xu, D. Tu, in: arXiv preprint arXiv:1802.02142, 2018. [OpenAccess], [Caffe]
- Face-magnet: Magnifying feature maps to detect small faces, P. Samangouei, M. Najibi, L. Davis, R. Chellappa, in: arXiv preprint arXiv:1803.05258, 2018. [OpenAccess], [Caffe]
- Selective refinement network for high performance face detection, C. Chi, S. Zhang, J. Xing, Z. Lei, S. Z. Li, X. Zou, in: arXiv preprint arXiv:1809.02693, 2018. [OpenAccess], [Pytorch],
SRN
- Pyramidbox: A context-assisted single shot face detector, X. Tang, D. K. Du, Z. He, J. Liu, in: ECCV, 2018. [OpenAccess], [TensorFlow]
- Face detection using deep learning: An improved faster rcnn approach, X. Sun, P. Wu, S. C. Hoi, in: Neurocomputing, 2018. [OpenAccess]
- Seeing small faces from robust anchors perspective, C. Zhu, R. Tao, K. Luu, M. Savvides, [OpenAccess]
- Dsfd: Dual shot face detector, J. Li, Y. Wang, C. Wang, Y. Tai, J. Qian, J. Yang, C. Wang, J. Li, F. Huang, in: CVPR, 2019. [OpenAccess], [Pytorch],
DSFD
- Bilattice-based logical reasoning for human detection, V. D. Shet, J. Neumann, V. Ramesh, L. S. Davis, in: CVPR, 2007. [OpenAccess]
- Integral channel features, P. Dollar, Z. Tu, P. Perona, S. Belongie, in: BMVC, 2009. [OpenAccess], [Project],
ICF
- A structural filter approach to human detection, G. Duan, H. Ai, S. Lao, in: ECCV, 2010. [OpenAccess]
- Multi-cue pedestrian classification with partial occlusion handling, M. Enzweiler, A. Eigenstetter, B. Schiele, D. M. Gavrila, in: CVPR, 2010. [OpenAccess]
- A discriminative deep model for pedestrian detection with occlusion handling, W. Ouyang, X. Wang, in: CVPR, 2012. [OpenAccess]
- Modeling mutual visibility relationship in pedestrian detection, W. Ouyang, X. Zeng, X. Wang, in: CVPR, 2013. [OpenAccess]
- Single-pedestrian detection aided by multi-pedestrian detection, W. Ouyang, X. Wang, in: CVPR, 2013. [OpenAccess]
- Pedestrian detection with unsupervised multi-stage feature learning, P. Sermanet, K. Kavukcuoglu, S. Chintala, Y. LeCun, in: CVPR, 2013. [OpenAccess]
- Joint deep learning for pedestrian detection, W. Ouyang, X. Wang, in: ICCV, 2013. [OpenAccess]
- Handling occlusions with franken-classifiers, M. Mathias, R. Benenson, R. Timofte, L. Van Gool, in: ICCV, 2013. [OpenAccess]
- Ten years of pedestrian detection, what have we learned?, R. Benenson, M. Omran, J. Hosang, B. Schiele, in: ECCV, 2014. [OpenAccess]
- Detection and tracking of occluded people, S. Tang, M. Andriluka, B. Schiele, in: IJCV, 2014. [OpenAccess]
- Learning complexity-aware cascades for deep pedestrian detection, Z. Cai, M. Saberian, N. Vasconcelos, in: ICCV, 2015. [OpenAccess]
- Taking a deeper look at pedestrians, J. Hosang, M. Omran, R. Benenson, B. Schiele, [OpenAccess]
- Deep learning strong parts for pedestrian detection, Y. Tian, P. Luo, X. Wang, X. Tang, in: CVPR, 2015. [OpenAccess]
- A unified multi-scale deep convolutional neural network for fast object detection, Z. Cai, Q. Fan, R. S. Feris, N. Vasconcelos, [OpenAccess], [Caffe],
MSCNN
- Dave: A unified framework for fast vehicle detection and annotation, Y. Zhou, L. Liu, L. Shao, M. Mellor, in: ECCV, 2016. [OpenAccess]
- Is faster r-cnn doing well for pedestrian detection?, L. Zhang, L. Lin, X. Liang, K. He, in: ECCV, 2016. [OpenAccess], [Caffe]
- Exploit all the layers: Fast and accurate cnn object detector with scale dependent pooling and cascaded rejection classifiers, F. Yang, W. Choi, Y. Lin, [ OpenAccess],
SDP-CRC
- Accurate single stage detector using recurrent rolling convolution, J. Ren, X. Chen, J. Liu, W. Sun, J. Pang, Q. Yan, Y.-W. Tai, L. Xu, [OpenAccess], [Caffe],
RRC
- What can help pedestrian detection?, J. Mao, T. Xiao, Y. Jiang, Z. Cao, in: CVPR, 2017. [OpenAccess]
- Learning cross-modal deep representations for robust pedestrian detection, D. Xu, W. Ouyang, E. Ricci, X. Wang, N. Sebe, in: CVPR, 2017. [OpenAccess], [Caffe],
CMT-CNN
- Repulsion loss: Detecting pedestrians in a crowd, X. Wang, T. Xiao, Y. Jiang, S. Shao, J. Sun, C. Shen, in: CVPR, 2018. [OpenAccess], [Pytorch]
- Bi-box regression for pedestrian detection and occlusion estimation, C. Zhou, J. Yuan, in: ECCV, 2018. [OpenAccess]
- Occlusion-aware r-cnn: Detecting pedestrians in a crowd, S. Zhang, L. Wen, X. Bian, Z. Lei, S. Z. Li, in: ECCV, 2018. [OpenAccess],
OR R-CNN
- Scale-aware fast r-cnn for pedestrian detection, J. Li, X. Liang, S. Shen, T. Xu, J. Feng, S. Yan, [Arxiv], in: TMM, 2018.
SAF R-CNN
- Pcn: Part and context information for pedestrian detection with cnns, S. Wang, J. Cheng, H. Liu, M. Tang, in: arXiv preprint arXiv:1804.04483, 2018. [OpenAccess]
Pascal VOC
: The pascal visual object classes (voc) challenge, M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, A. Zisserman, [OpenAccess], [Project]ImageNet
: Imagenet: A large-scale hierarchical image database, * J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, L. Fei-Fei*, [OpenAccess], [Project]MSCOCO
: Microsoft COCO: Common Objects in Context, T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollar, C. L. Zitnick, [OpenAccess], [Project]Open Images
: The open images dataset v4: Unified image classification, object detection, and visual relationship detection at scale, A. Kuznetsova, H. Rom, N. Alldrin, J. Uijlings, I. Krasin, J. Pont-Tuset, S. Kamali, S. Popov, M. Malloci, T. Duerig, et al., [OpenAccess], [Project]LVIS
: Lvis: A dataset for large vocabulary instance segmentation, A. Gupta, P. Dollar, R. Girshick, [OpenAccess], [Project]
WIDER FACE
: Wider face: A face detection benchmark, S. Yang, P. Luo, C.-C. Loy, X. Tang, [OpenAccess], [Project]FDDB
: Fddb: A benchmark for face detection in unconstrained settings, V. Jain, E. Learned-Miller, [OpenAccess], [Project]PASCAL FACE
: The pascal visual object classes (voc) challenge, M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, A. Zisserman, [OpenAccess], [Project]MALF
: Automatic Face and Gesture Recognition (FG), Yang, Bin and Yan, Junjie and Lei, Zhen and Li, Stan Z, [OpenAccess], [Project]AFW
: Face detection, pose estimation and landmark localization in the wild, X. Zhu, D. Ramanan, [OpenAccess], [Project]
CityPersons
: Citypersons: A diverse dataset for pedestrian detection, S. Zhang, R. Benenson, B. Schiele, [OpenAccess], [Project]Caltech
: Pedestrian detection: An evaluation of the state of the art, P. Dollar, C. Wojek, B. Schiele, P. Perona, [OpenAccess], [Project]ETH
: Depth and appearance for mobile scene analysis, A. Ess, B. Leibe, L. Van Gool, [OpenAccess], [Project]INRIA
: Histograms of oriented gradients for human detection, N. Dalal, B. Triggs, [OpenAccess], [Project]KITTI
: Vision meets robotics: The kitti dataset, A. Geiger, P. Lenz, C. Stiller, R. Urtasun, [OpenAccess], [Project]
Method | Backbone | Proposed Year | Input size(Test) | VOC2007 | VOC2012 |
---|---|---|---|---|---|
Two-stage | |||||
R-CNN | VGG-16 | 2014 | Arbitrary | 66.0∗ | 62.4† |
SPP-net | VGG-16 | 2014 | ~600 × 1000 | 63.1∗ | - |
Fast R-CNN | VGG-16 | 2015 | ~600 × 1000 | 70.0 | 68.4 |
Faster R-CNN | VGG-16 | 2015 | ~600 × 1000 | 73.2 | 70.4 |
MR-CNN | VGG-16 | 2015 | Multi-Scale | 78.2 | 73.9 |
Faster R-CNN | ResNet-101 | 2016 | ~600 × 1000 | 76.4 | 73.8 |
R-FCN | ResNet-101 | 2016 | ~600 × 1000 | 80.5 | 77.6 |
OHEM | VGG-16 | 2016 | ~600 × 1000 | 74.6 | 71.9 |
HyperNet | VGG-16 | 2016 | ~600 × 1000 | 76.3 | 71.4 |
ION | VGG-16 | 2016 | ~600 × 1000 | 79.2 | 76.4 |
CRAFT | VGG-16 | 2016 | ~600 × 1000 | 75.7 | 71.3† |
LocNet | VGG-16 | 2016 | ~600 × 1000 | 78.4 | 74.8† |
R-FCN w DCN | ResNet-101 | 2017 | ~600 × 1000 | 82.6 | - |
CoupleNet | ResNet-101 | 2017 | ~600 × 1000 | 82.7 | 80.4 |
DeNet512(wide) | ResNet-101 | 2017 | ~512 × 512 | 77.1 | 73.9 |
FPN-Reconfig | ResNet-101 | 2018 | ~600 × 1000 | 82.4 | 81.1 |
DeepRegionLet | ResNet-101 | 2018 | ~600 × 1000 | 83.3 | 81.3 |
DCN+R-CNN | ResNet-101+ResNet-152 | 2018 | Arbitrary | 84.0 | 81.2 |
One-stage | |||||
YOLOv1 | VGG16 | 2016 | 448 × 448 | 66.4 | 57.9 |
SSD512 | VGG-16 | 2016 | 512 × 512 | 79.8 | 78.5 |
YOLOv2 | Darknet | 2017 | 544 × 544 | 78.6 | 73.5 |
DSSD513 | ResNet-101 | 2017 | 513 × 513 | 81.5 | 80.0 |
DSOD300 | DS/64-192-48-1 | 2017 | 300 × 300 | 77.7 | 76.3 |
RON384 | VGG-16 | 2017 | 384 × 384 | 75.4 | 73.0 |
STDN513 | DenseNet-169 | 2018 | 513 × 513 | 80.9 | - |
RefineDet512 | VGG-16 | 2018 | 512 × 512 | 81.8 | 80.1 |
RFBNet512 | VGG16 | 2018 | 512 × 512 | 82.2 | - |
CenterNet | ResNet101 | 2019 | 512 × 512 | 78.7 | - |
CenterNet | DLA | 2019 | 512 × 512 | 80.7 | - |
∗: This entry reports the the model is trained with VOC2007 trainval sets only. †: This entry reports the the model are trained with VOC2012 trainval sets only .
Method | Backbone | Year | AP | AP$_{50}$ | AP$_{75}$ | AP$_{S}$ | AP$_{M}$ | AP$_{L}$ |
---|---|---|---|---|---|---|---|---|
Two-stage | ||||||||
Fast R-CNN | VGG-16 | 2015 | 19.7 | 35.9 | - | - | - | - |
Faster R-CNN | VGG-16 | 2015 | 21.9 | 42.7 | - | - | - | - |
OHEM | VGG-16 | 2016 | 22.6 | 42.5 | 22.2 | 5.0 | 23.7 | 37.9 |
ION | VGG-16 | 2016 | 23.6 | 43.2 | 23.6 | 6.4 | 24.1 | 38.3 |
OHEM++ | VGG-16 | 2016 | 25.5 | 45.9 | 26.1 | 7.4 | 27.7 | 40.3 |
R-FCN | ResNet-101 | 2016 | 29.9 | 51.9 | - | 10.8 | 32.8 | 45.0 |
Faster R-CNN+++ | ResNet-101 | 2016 | 34.9 | 55.7 | 37.4 | 15.6 | 38.7 | 50.9 |
Faster R-CNN w FPN | ResNet-101 | 2016 | 36.2 | 59.1 | 39.0 | 18.2 | 39.0 | 48.2 |
DeNet-101(wide) | ResNet-101 | 2017 | 33.8 | 53.4 | 36.1 | 12.3 | 36.1 | 50.8 |
CoupleNet | ResNet-101 | 2017 | 34.4 | 54.8 | 37.2 | 13.4 | 38.1 | 50.8 |
Faster R-CNN by G-RMI | Inception-ResNet-v2 | 2017 | 34.7 | 55.5 | 36.7 | 13.5 | 38.1 | 52.0 |
Deformable R-FCN | Aligned-Inception-ResNet | 2017 | 37.5 | 58.0 | 40.8 | 19.4 | 40.1 | 52.5 |
Mask-RCNN | ResNeXt-101 | 2017 | 39.8 | 62.3 | 43.4 | 22.1 | 43.2 | 51.2 |
umd det | ResNet-101 | 2017 | 40.8 | 62.4 | 44.9 | 23.0 | 43.4 | 53.2 |
Fitness-NMS | ResNet-101 | 2017 | 41.8 | 60.9 | 44.9 | 21.5 | 45.0 | 57.5 |
DCN w Relation Net | ResNet-101 | 2018 | 39.0 | 58.6 | 42.9 | - | - | - |
DeepRegionlets | ResNet-101 | 2018 | 39.3 | 59.8 | - | 21.7 | 43.7 | 50.9 |
C-Mask RCNN | ResNet-101 | 2018 | 42.0 | 62.9 | 46.4 | 23.4 | 44.7 | 53.8 |
Group Norm | ResNet-101 | 2018 | 42.3 | 62.8 | 46.2 | - | - | - |
DCN+R-CNN | ResNet-101+ResNet-152 | 2018 | 42.6 | 65.3 | 46.5 | 26.4 | 46.1 | 56.4 |
Cascade R-CNN | ResNet-101 | 2018 | 42.8 | 62.1 | 46.3 | 23.7 | 45.5 | 55.2 |
SNIP++ | DPN-98 | 2018 | 45.7 | 67.3 | 51.1 | 29.3 | 48.8 | 57.1 |
SNIPER++ | ResNet-101 | 2018 | 46.1 | 67.0 | 51.6 | 29.6 | 48.9 | 58.1 |
PANet++ | ResNeXt-101 | 2018 | 47.4 | 67.2 | 51.8 | 30.1 | 51.7 | 60.0 |
Grid R-CNN | ResNeXt-101 | 2019 | 43.2 | 63.0 | 46.6 | 25.1 | 46.5 | 55.2 |
DCN-v2 | ResNet-101 | 2019 | 44.8 | 66.3 | 48.8 | 24.4 | 48.1 | 59.6 |
DCN-v2++ | ResNet-101 | 2019 | 46.0 | 67.9 | 50.8 | 27.8 | 49.1 | 59.5 |
TridentNet | ResNet-101 | 2019 | 42.7 | 63.6 | 46.5 | 23.9 | 46.6 | 56.6 |
TridentNet | ResNet-101-Deformable | 2019 | 48.4 | 69.7 | 53.5 | 31.8 | 51.3 | 60.3 |
One-stage | ||||||||
SSD512 | VGG-16 | 2016 | 28.8 | 48.5 | 30.3 | 10.9 | 31.8 | 43.5 |
RON384++ | VGG-16 | 2017 | 27.4 | 49.5 | 27.1 | - | - | - |
YOLOv2 | DarkNet-19 | 2017 | 21.6 | 44.0 | 19.2 | 5.0 | 22.4 | 35.5 |
SSD513 | ResNet-101 | 2017 | 31.2 | 50.4 | 33.3 | 10.2 | 34.5 | 49.8 |
DSSD513 | ResNet-101 | 2017 | 33.2 | 53.3 | 35.2 | 13.0 | 35.4 | 51.1 |
RetinaNet800++ | ResNet-101 | 2017 | 39.1 | 59.1 | 42.3 | 21.8 | 42.7 | 50.2 |
STDN513 | DenseNet-169 | 2018 | 31.8 | 51.0 | 33.6 | 14.4 | 36.1 | 43.4 |
FPN-Reconfig | ResNet-101 | 2018 | 34.6 | 54.3 | 37.3 | - | - | - |
RefineDet512 | ResNet-101 | 2018 | 36.4 | 57.5 | 39.5 | 16.6 | 39.9 | 51.4 |
RefineDet512++ | ResNet-101 | 2018 | 41.8 | 62.9 | 45.7 | 25.6 | 45.1 | 54.1 |
GHM SSD | ResNeXt-101 | 2018 | 41.6 | 62.8 | 44.2 | 22.3 | 45.1 | 55.3 |
CornerNet511 | Hourglass-104 | 2018 | 40.5 | 56.5 | 43.1 | 19.4 | 42.7 | 53.9 |
CornerNet511++ | Hourglass-104 | 2018 | 42.1 | 57.8 | 45.3 | 20.8 | 44.8 | 56.7 |
M2Det800 | VGG-16 | 2019 | 41.0 | 59.7 | 45.0 | 22.1 | 46.5 | 53.8 |
M2Det800++ | VGG-16 | 2019 | 44.2 | 64.6 | 49.3 | 29.2 | 47.9 | 55.1 |
ExtremeNet | Hourglass-104 | 2019 | 40.2 | 55.5 | 43.2 | 20.4 | 43.2 | 53.1 |
CenterNet-HG | Hourglass-104 | 2019 | 42.1 | 61.1 | 45.9 | 24.1 | 45.5 | 52.8 |
FCOS | ResNeXt-101 | 2019 | 42.1 | 62.1 | 45.2 | 25.6 | 44.9 | 52.0 |
FSAF | ResNeXt-101 | 2019 | 42.9 | 63.8 | 46.3 | 26.6 | 46.2 | 52.7 |
CenterNet511 | Hourglass-104 | 2019 | 44.9 | 62.4 | 48.1 | 25.6 | 47.4 | 57.4 |
CenterNet511++ | Hourglass-104 | 2019 | 47.0 | 64.5 | 50.7 | 28.9 | 49.9 | 58.9 |
- Denet: Scalable real-time object detection with directed sparse sampling, L. Tychsen-Smith, L. Petersson, in: ICCV, 2017. [OpenAccess],[Theano],
DeNet
- Cornernet: Detecting objects as paired keypoints, H. Law, J. Deng, in: ECCV, 2018. [OpenAccess], [Pytorch],
CornerNet
- Objects as points, X. Zhou, D. Wang, P. Krahenb ¨ uhl, [Arxiv], [Pytorch], in: arXiv preprint arXiv:1904.07850, 2019.
CenterNet
- Centernet: Keypoint triplets for object detection, K. Duan, S. Bai, L. Xie, H. Qi, Q. Huang, Q. Tian, in: arXiv preprint arXiv:1904.08189, 2019. [Arxiv], [Pytorch],
CenterNet
- Bottom-up object detection by grouping extreme and center points, X. Zhou, J. Zhuo, P. Krahenbuhl, in: CVPR, 2019. [OpenAccess], [Pytorch],
ExtremeNet
- Feature selective anchor-free module for single-shot object detection, C. Zhu, Y. He, M. Savvides, in: CVPR, 2019. [OpenAccess],
FSFA
- Fcos: Fully convolutional one-stage object detection, Z. Tian, C. Shen, H. Chen, T. He, in: ICCV, 2019. [OpenAccess], [Pytorch],
FCOS
- CornerNet-Lite: Efficient Keypoint Based Object Detection, Hei Law, Yun Teng, Olga Russakovsky, Jia Deng, in: arXiv preprint arXiv:1904.08900, 2019. [OpenAccess], [Pytorch],
CornerNet-Lite
- RepPoints: Point Set Representation for Object Detection, Z. Yang, S. Liu, H. Hu, L. Wang, S. Lin, in: ICCV, 2019. [OpenAccess],
RepPoints
- Yolo9000: better, faster, stronger, J. Redmon, A. Farhadi, [OpenAccess], [DarkNet], in: CVPR, 2017.
YOLOv2
- Cascade r-cnn: Delving into high quality object detection, Z. Cai, N. Vasconcelos, in: CVPR, 2018. [OpenAccess], [Caffe], [Caffe2]
Cascade R-CNN
- Single-shot refinement neural network for object detection, S. Zhang, L. Wen, X. Bian, Z. Lei, S. Z. Li, in: CVPR, 2018. [OpenAccess], [Caffe],
RefineDet
- Metaanchor: Learning to detect objects with customized anchors, T. Yang, X. Zhang, Z. Li, W. Zhang, J. Sun, in: NeurIPS, 2018. [OpenAccess],
MetaAnchor
- Derpn: Taking a further step toward more general object detection, L. J. Z. X. Lele Xie, Yuliang Liu, in: AAAI, 2019. [OpenAccess], [Caffe],
DeRPN
- Region Proposal by Guided Anchoring, J. Wang, K. Chen, S. Yang, C. C. Loy, D. Lin, [OpenAccess], [mmdetection]
- Revisiting Feature Alignment for One-stage Object Detection, Y. Chen, C. Han, N. Wang, Z. Zhang, in: arXiv preprint arXiv:1908.01570, 2019, [OpenAccess],
AlignDet
- PosNeg-Balanced Anchors with Aligned Features for Single-Shot Object Detection, Qiankun Tang, Shice Liu, Jie Li, Yu Hu, in: arXiv preprint arXiv:1908.03295, 2019, [OpenAccess], [Pytorch],
PADet
- Cascade RetinaNet: Maintaining Consistency for Single-Stage Object Detection, Q. Tang, S. Liu, J. Li, Y. Hu, in: BMVC, 2019, [OpenAccess],
CaRetinaNet
- Nas-fpn: Learning scalable feature pyramid architecture for object detection, G. Ghiasi, T.-Y. Lin, Q. V. Le, [OpenAccess], [TensorFlow],
NAS-FPN
- Detnas: Neural architecture search on object detection, Y. Chen, T. Yang, X. Zhang, G. Meng, C. Pan, J. Sun, in: arXiv preprint arXiv:1903.10979, 2019. [OpenAccess],
DetNas
- Learning data augmentation strategy, B. Zoph, E. D. Cubuk, G. Ghiasi, T.-Y. Lin, J. Shlens, Q. V. Le, in: arXiv preprint arXiv:1906.11172, 2019. [OpenAccess], [TensorFlow]
- AutoAugment: Learning Augmentation Strategies from Data, E. D. Cubuk, B. Zoph, D. Mane, V. Vasudevan, Q. V. Le, in: CVPR, 2019. [OpenAccess],
AutoAugment
- Few-example object detection with model communication, X. Dong, L. Zheng, F. Ma, Y. Yang, D. Meng, in: TPAMI, 2018. [OpenAccess], [Project],
MSPLD
- Lstd: A low-shot transfer detector for object detection, H. Chen, Y. Wang, G. Wang, Y. Qiao, in: AAAI, 2018.
[OpenAccess], [Caffe],
LSTD
- Repmet: Representative-based metric learning for classification and one-shot object detection, E. Schwartz, L. Karlinsky, J. Shtok, S. Harary, M. Marder, S. Pankanti, R. Feris, A. Kumar, R. Giries, A. M. Bronstein, in: CVPR, 2019. [OpenAccess], [Pytorch],
RepMet
- Megdet: A large mini-batch object detector, C. Peng, T. Xiao, Z. Li, Y. Jiang, X. Zhang, K. Jia, G. Yu, J. Sun, in: CVPR, 2018 [OpenAccess],
Megdet
- Incremental learning of object detectors without catastrophic forgetting, *K. Shmelkov, C. Schmid, K. Alahari, *, in: ICCV, 2017. [OpenAccess], [TensorFlow]