Collection of papers and other resources for object detection and tracking using deep learning
Object Detection
- Mask R-CNN (pdf, arxiv, github) by Facebook AI Research!
- Tensorflow object detection API: Only the two SSD nets can run at 12.5 FPS on one GTX 1080 TI (less accurate than YOLO 604x604). Next two models at 4-5 FPS (4-5% mAP better than YOLO). Best model < 1 FPS. Currently code only allow inference of 1 image at a time. Speed might improve by 2.5 times when they allow multiple image inference.
Object Tracking
Multi Object Tracking
Learning to Track: Online Multi-object Tracking by Decision Making (ICCV 2015) (Stanford) (pdf, github (Matlab), project page)
Tracking The Untrackable: Learning To Track Multiple Cues with Long-Term Dependencies (arxiv April 2017) (Stanford) (pdf, arxiv, project page)
Near-Online Multi-target Tracking with Aggregated Local Flow Descriptor (ICCV 2015) (NEC Labs) (pdf, author page)
A Multi-cut Formulation for Joint Segmentation and Tracking of Multiple Objects (highest MT on MOT2015) (University of Freiburg, Germany) (pdf, arxiv, author page)
Single Object Tracking
Deep Reinforcement Learning for Visual Object Tracking in Videos (arxiv April 2017) (USC-Santa Barbara, Samsung Research) (pdf, arxiv, author page)
Visual Tracking by Reinforced Decision Making (arxiv February 2017) (Seoul National University, Chung-Ang University) (pdf, arxiv, author page)
Action-Decision Networks for Visual Tracking with Deep Reinforcement Learning (CVPR 2017) (Seoul National University) (pdf, project page)
End-to-end Active Object Tracking via Reinforcement Learning (arxiv 30 May 2017) (Peking University, Tencent AI Lab) (pdf, arxiv
Other potentially useful papers
- Deep Feature Flow for Video Recognition (pdf, arxiv, github) by Microsoft Research
- IDOT dataset
- UA-DETRAC Benchmark Suite
- GRAM Road-Traffic Monitoring
- Stanford Drone Dataset
- Ko-PER Intersection Dataset
- TRANCOS Dataset
- Urban Tracker Dataset
- DARPA VIVID / PETS 2005 dataset (Non stationary camera)
- KIT-AKS Dataset (No ground truth)
- CBCL StreetScenes Challenge Framework (No top down viewpoint)
- MOT 2015 (mostly street level camera viewpoint)
- MOT 2016 (mostly street level camera viewpoint)
- MOT 2017 (mostly street level camera viewpoint)
- PETS 2009 (No vehicles)
- PETS 2017 (Low density; mostly pedestrians)
- KITTI Tracking Dataset (No top down viewpoint; non stationary camera)
- List of traffic surveillance datasets
- List of deep learning based tracking papers
- List of multi object tracking papers
- List of single object trackers with results on OTB
- List of Matlab frameworks, libraries and software
