Collection of papers and other resources for object detection and tracking using deep learning
- Region Proposal
- RCNN
- Faster R-CNN Towards Real-Time Object Detection with Region Proposal Networks tpami17 (pdf, notes)
- Mask R-CNN (pdf, (notes, arxiv, code (keras), code (tensorflow)) [Facebook AI Research]
- YOLO
- SSD
- RetinaNet
- Misc
- Tubelet
- FGFA
- RNN
- Learning to Track: Online Multi-object Tracking by Decision Making (ICCV 2015) (Stanford) (pdf, code (Matlab), project page, notes)
- Tracking The Untrackable: Learning To Track Multiple Cues with Long-Term Dependencies (arxiv April 2017) (Stanford) (pdf, arxiv, project page, notes)
- Near-Online Multi-target Tracking with Aggregated Local Flow Descriptor (ICCV 2015) (NEC Labs) (pdf, author page, notes)
- A Multi-cut Formulation for Joint Segmentation and Tracking of Multiple Objects (arxiv July 2016) (highest MT on MOT2015) (University of Freiburg, Germany) (pdf, arxiv, author page, notes)
- Deep Network Flow for Multi-Object Tracking (CVPR 2017) (NEC Labs) (pdf, supplementary, notes)
- Deep Reinforcement Learning for Visual Object Tracking in Videos (arxiv April 2017) (USC-Santa Barbara, Samsung Research) (pdf, arxiv, author page, notes)
- Visual Tracking by Reinforced Decision Making (arxiv February 2017) (Seoul National University, Chung-Ang University) (pdf, arxiv, author page, notes)
- Action-Decision Networks for Visual Tracking with Deep Reinforcement Learning (CVPR 2017) (Seoul National University) (pdf, supplementary, project page, notes)
- End-to-end Active Object Tracking via Reinforcement Learning (arxiv 30 May 2017) (Peking University, Tencent AI Lab) (pdf, arxiv)
- Video Frame Interpolation via Adaptive Convolution (CVPR 2017 / ICCV 2017) (pdf (cvpr17), (pdf (iccv17), ppt)
- IDOT dataset
- UA-DETRAC Benchmark Suite
- GRAM Road-Traffic Monitoring
- Stanford Drone Dataset
- Ko-PER Intersection Dataset
- TRANCOS Dataset
- Urban Tracker Dataset
- DARPA VIVID / PETS 2005 dataset (Non stationary camera)
- KIT-AKS Dataset (No ground truth)
- CBCL StreetScenes Challenge Framework (No top down viewpoint)
- MOT 2015 (mostly street level camera viewpoint)
- MOT 2016 (mostly street level camera viewpoint)
- MOT 2017 (mostly street level camera viewpoint)
- PETS 2009 (No vehicles)
- PETS 2017 (Low density; mostly pedestrians)
- KITTI Tracking Dataset (No top down viewpoint; non stationary camera)
- Datasets
- Single Object Tracking
- Multi Object Tracking
- Misc
- Static Detection
- Deep Learning for Object Detection: A Comprehensive Review
- Review of Deep Learning Algorithms for Object Detection
- A Simple Guide to the Versions of the Inception Network
- R-CNN, Fast R-CNN, Faster R-CNN, YOLO - Object Detection Algorithms
- A gentle guide to deep learning object detection
- The intuition behind RetinaNet
- YOLO—You only look once, real time object detection explained
- Understanding Feature Pyramid Networks for object detection (FPN)
- Fast object detection with SqueezeDet on Keras
- Region of interest pooling explained
- Video Detection
- Deep RL
-
Multi Object Tracking
- Combined Image- and World-Space Tracking in Traffic Scenes [ICRA 2017] [C++]
- Learning to Track: Online Multi-Object Tracking by Decision Making [ICCV 2015] [MATLAB]
- Multiple Hypothesis Tracking Revisited [ICCV 2015] [highest MT on MOT2015 among open source trackers] [MATLAB]
- Joint Tracking and Segmentation of Multiple Targets [CVPR 2015] [MATLAB]
- High-Speed Tracking-by-Detection Without Using Image Information [AVSS 2017] [Python]
- Continuous Energy Minimization for Multitarget Tracking [TPAMI 2014 / CVPR 2011 / ICCV 2011] [MATLAB]
- Robust online multi-object tracking based on tracklet confidence and online discriminative appearance learning [CVPR 2014] [MATLAB] (project)
- Discrete-Continuous Energy Minimization for Multi-Target Tracking [CVPR 2012] [MATLAB] (project)
- Multiple target tracking based on undirected hierarchical relation hypergraph [CVPR 2014] [C++] (author)
- Globally-optimal greedy algorithms for tracking a variable number of objects [CVPR 2011] [MATLAB] (author)
- The way they move: Tracking multiple targets with similar appearance [ICCV 2013] [MATLAB]
-
Single Object Tracking
- A collection of common tracking algorithms (2003-2012)
- Detect to Track and Track to Detect (ICCV 2017)[MATLAB]
- DeepTracking: Seeing Beyond Seeing Using Recurrent Neural Networks (AAAI 2016)[Torch 7]
- Hierarchical Convolutional Features for Visual Tracking (ICCV 2015)[Matlab]
- Learning Multi-Domain Convolutional Neural Networks for Visual Tracking (Winner of The VOT2015 Challenge)[Matlab/MatConvNet]
- RATM: Recurrent Attentive Tracking Model[Python]
- Visual Tracking with Fully Convolutional Networks (ICCV 2015)[Matlab]
- Fully-Convolutional Siamese Networks for Object Tracking[Tensor flow]
- Beyond Correlation Filters: Learning Continuous Convolution Operators for Visual Tracking (ECCV 2016)[MATLAB]
- ECO: Efficient Convolution Operators for Tracking (CVPR 2017)[MATLAB]
- End-to-end representation learning for Correlation Filter based tracking (CVPR 2017)[MATLAB]
- ROLO : Spatially Supervised Recurrent Convolutional Neural Networks for Visual Object Tracking (ISCAS 2017)[tensorfow]
- Deep SORT : Simple Online Realtime Tracking with a Deep Association Metric (ICIP 2017)[python]
-
Static Detection and Matching
- Frameworks
- Tensorflow object detection API
- Only the two SSD nets can run at 12.5 FPS on one GTX 1080 TI (less accurate than YOLO 604x604). Next two models at 4-5 FPS (4-5% mAP better than YOLO). Best model < 1 FPS. Currently code only allow inference of 1 image at a time. Speed might improve by 2.5 times when they allow multiple image inference.
- Detectron
- Tensorflow object detection API
- SSD
- SSD-Tensorflow [tensorflow]
- SSD-Tensorflow (tf.estimator) [tensorflow]
- SSD-Tensorflow (tf.slim) [tensorflow]
- SSD-Keras [keras]
- SSD-Pytorch [pytorch]
- Enhanced SSD with Feature Fusion and Visual Reasoning [NCA18] [TensorFlow]
- RCNN
- PVANet: Lightweight Deep Neural Networks for Real-time Object Detection
- Mask R-CNN TensorFlow [TensorFlow]
- Mask R-CNN Keras [keras]
- Light-head R-CNN [cvpr18] [TensorFlow]
- Evolving Boxes for Fast Vehicle Detection [icme18] [Caffe/Python]
- YOLO
- Darknet: Convolutional Neural Networks [c/python]
- YOLO9000: Better, Faster, Stronger - Real-Time Object Detection. 9000 classes! [c/python]
- Darkflow [tensorflow]
- Pytorch Yolov2 [pytorch]
- RFCN
- RFCN (author) [caffe/matlab]
- RFCN-tensorflow [tensorflow]
- Region Proposal
- MCG : Multiscale Combinatorial Grouping - Object Proposals and Segmentation (project) [tpami16/cvpr14] [python]
- COB : Convolutional Oriented Boundaries (project) [tpami18/eccv16] [matlab/caffe]
- FPN
- Feature Pyramid Networks for Object Detection [caffe/python]
- Misc
- R-FCN: Object Detection via Region-based Fully Convolutional Networks
- Relation Networks for Object Detection [cvpr18] [MXNet]
- DeNet: Scalable Real-time Object Detection with Directed Sparse Sampling [iccv17(poster)] [theano]
- Matching
- Frameworks
-
Video Detection
-
Optical Flow
-
Deep RL
-
Misc