Deep Learning for Visual Tracking: A Comprehensive Survey (Extended version of the paper on arXiv)

The comprehensive comparisons of recent Deep Learning (DL)-based visual tracking methods on the OTB-2013, OTB-2015, VOT-2018, and LaSOT datasets (Raw Results on OTB Dataset, Raw Results on VOT Dataset, Raw Results on LaSOT Dataset).

Performance comparisons will be updated soon with more datasets & methods.!

===============================================================================

Licence: copyright 2019 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.

===============================================================================

Performance Comparison of Visual Trackers in terms of Precision and Success Plots on OTB-2013 Dataset (Ranking based on Area Under Curve (AUC)) [OTB-2013 Dataset] [OTB-2013 Paper]:

- Average Performance Comparisons of Visual Tracking Methods: - Attribute-based Performance Comparisons (Eleven attributes including: Illumination Variation (IV), Scale Variation (SV), Occlusion (OCC), Deformation (DEF), Motion Blur (MB), Fast Motion (FM), In-Plane Rotation (IPR), Out-of-Plane Rotation (OPR), Out-of-View (OV), Background Clutter (BC), Low Resolution (LR)):

Performance Comparison of Visual Trackers in terms of Precision and Success Plots on OTB-2015 Dataset (Ranking based on Area Under Curve (AUC)) [OTB-2015 Dataset][OTB-2015 Paper]:

- Average Performance Comparisons of Visual Tracking Methods: - Attribute-based Performance Comparisons (Eleven attributes including: Illumination Variation (IV), Scale Variation (SV), Occlusion (OCC), Deformation (DEF), Motion Blur (MB), Fast Motion (FM), In-Plane Rotation (IPR), Out-of-Plane Rotation (OPR), Out-of-View (OV), Background Clutter (BC), Low Resolution (LR)):

Performance Comparison of Visual Trackers on VOT-2018 Dataset [VOT-2018 Dataset][VOT-2018 Paper]:

Experiment Baseline (Expected Overlap Analysis):

- Expected overlap curves:

- Expected overlap scores:

- Overview: Expected Overlap Analysis:

Experiment Baseline (Accuracy-Robustness (AR) Ranking):

- AR plot (mean):

- AR plot (weighted_mean):

- AR plot (pooled):

- Table: Accuracy:

- Table: Robustness:

Experiment Baseline (Attribute-based Ranking: Camera Motion, Illumination Change, Motion Change, Occlusion, Size Change, No Degradation):

- Orderings for overall overlap:

- Orderings for failures:

- AR plot for camera motion:

- AR plot for illumination change:

- AR plot for motion change:

- AR plot for occlusion:

- AR plot for size change:

- AR plot for no degradation:

Experiment Baseline (Speed Report) [first to third methods are shown by yellow, blue, and green colors.]:

- Raw Frame per Second (FPS):

- Normalized (Equivalent Filter Operations (EFO): "tracker speed in terms of a predefined filtering operation that the VOT tookit automatically carries out prior to running the experiments"):

Experiment Unsupervised (Overall Comparisons):

- Average overlap:

- Table: Overlap Overview:

- Orderings for overall overlap:

Experiment Unsupervised (Attribute-based Comparisons):

- Overlap plot for camera motion:

- Overlap plot for illum change:

- Overlap plot for motion change:

- Overlap plot for occlusion:

- Overlap plot for size change:

- Overlap plot for no degradation:

Experiment Unsupervised (Speed Report): [first to third methods are shown by yellow, blue, and green colors.]:

- Raw Frame per Second (FPS):

- Normalized (Equivalent Filter Operations (EFO): "tracker speed in terms of a predefined filtering operation that the VOT tookit automatically carries out prior to running the experiments"):

Overall Comparison:

- Report Overview:

Qualitative Comparisons of State-of-the-art Visual Tracking Methods on VOT2018 Dataset (Under TraX Protocol):

BMX_Video, Crabs1_Video, Gymnastics3_Video, Motorcross2_Video, Singer3_Video, Godfather_Video, Bag_Video, Dinasaur_Video, Matrix_Video, Hand_Video, Glove_Video, Ball2_Video, Blanket_Video, Gymnastics1_Video, Butterfly_Video, Motorcross1_Video, Pedestrian_Video, Singer2_Video, Shaking_Video, Racing_Video, Handball1_Video, Sheep_Video, Bolt1_Video, Fernando_Video, Bolt2_Video, Book_Video, Leaves_Video, Fish1_Video, Fish2_Video, Tiger_Video, Wiper_Video, Traffic_Video, Crossing_Video, Fish3_Video, Ball1_Video, Graduate_Video, Iceskater_Video, Soldier_Video, DroneAcross_Video, Soccer2_Video, DroneFlip_Video, Ants1_Video, Iceskater_Video, Handball2_Video, Nature_Video, Ants3_Video, Road_Video, Helicopter_Video, Girl_Video, Gymnastics2_Video, Conduction_Video, Zebrafish1_Video, Basketball_Video, Frisbee_Video, Car1_Video, Birds1_Video, Drone1_Video, Flamingo_Video.

Performance Comparison of Visual Trackers in terms of Precision and Success Plots on LaSOT Dataset (Ranking based on Area Under Curve (AUC)) [LaSOT Dataset][LaSOT Paper]:

- Average Performance Comparisons of Visual Tracking Methods: - Attribute-based Performance Comparisons (Fourteen attributes including: Illumination Variation (IV), Scale Variation (SV), Deformation (DEF), Motion Blur (MB), Fast Motion (FM), Out-of-View (OV), Background Clutter (BC), Low Resolution (LR), Aspect Ratio Change (ARC), Camera Motion (CM), Full Occlusion (FOC), Partial Occlusion (POC), Viewpoint Change (VC), Rotation (ROT)):

References (Experimentally Evaluated Visual Tracking Methods):

[1] C. Ma, J. B. Huang, X. Yang, and M. H. Yang, “Hierarchical convolutional features for visual tracking,” in Proc. IEEE ICCV, 2015, pp. 3074–3082. [HCFT]

[2] M. Danelljan, G. Hager, F. S. Khan, and M. Felsberg, “Convolutional features for correlation filter based visual tracking,” in Proc. IEEE ICCVW, 2016, pp. 621–629. [DeepSRDCF]

[3] M. Danelljan, A. Robinson, F. S. Khan, and M. Felsberg, “Beyond correlation filters: Learning continuous convolution operators for visual tracking,” in Proc. ECCV, vol. 9909 LNCS, 2016, pp. 472–488. [CCOT]

[4] L. Bertinetto, J. Valmadre, J. F. Henriques, A. Vedaldi, and P. H. Torr, “Fully-convolutional Siamese networks for object tracking,” in Proc. ECCV, 2016, pp. 850–865. [SiamFC]

[5] R. Tao, E. Gavves, and A.W. Smeulders, “Siamese instance search for tracking,” in Proc. IEEE CVPR, 2016, pp. 1420–1429. [SINT]

[6] H. Nam and B. Han, “Learning multi-domain convolutional neural networks for visual tracking,” in Proc. IEEE CVPR, 2016, pp. 4293–4302. [MDNet]

[7] Y. Qi, S. Zhang, L. Qin, H. Yao, Q. Huang, J. Lim, and M. H. Yang, “Hedged deep tracking,” in Proc. IEEE CVPR, 2016, pp. 4303–4311. [HDT]

[8] H. Fan and H. Ling, “Parallel tracking and verifying: A framework for real-time and high accuracy visual tracking,” in Proc. IEEE ICCV, 2017, pp. 5487–5495. [PTAV]

[9] H. Fan and H.Ling, “Parallel tracking and verifying,” IEEE Trans. Image Process., vol. 28, no. 8, pp. 4130–4144, 2019. [PTAV]

[10] Z. Zhu, G. Huang,W. Zou, D. Du, and C. Huang, “UCT: Learning unified convolutional networks for real-time visual tracking,” in Proc. ICCVW, 2018, pp. 1973–1982. [UCT]

[11] Q. Guo, W. Feng, C. Zhou, R. Huang, L. Wan, and S. Wang, “Learning dynamic Siamese network for visual object tracking,” in Proc. IEEE ICCV, 2017, pp. 1781–1789. [DSiam]

[12] J. Valmadre, L. Bertinetto, J. Henriques, A. Vedaldi, and P. H. Torr, “End-to-end representation learning for correlation filter based tracking,” in Proc. IEEE CVPR, 2017, pp. 5000–5008. [CFNet]

[13] M. Danelljan, G. Bhat, F. Shahbaz Khan, and M. Felsberg, “ECO: Efficient convolution operators for tracking,” in Proc. IEEE CVPR, 2017, pp. 6931–6939. [ECO]

[14] A. Lukezˇicˇ, T. Voj´ırˇ, L. Cˇ ehovin Zajc, J. Matas, and M. Kristan, “Discriminative correlation filter tracker with channel and spatial reliability,” IJCV, vol. 126, no. 7, pp. 671–688, 2018. [DeepCSRDCF]

[15] T. Zhang, C. Xu, and M. H. Yang, “Multi-task correlation particle filter for robust object tracking,” in Proc. IEEE CVPR, 2017, pp. 4819–4827. [MCPF]

[16] J. Choi, H. J. Chang, S. Yun, T. Fischer, Y. Demiris, and J. Y. Choi, “Attentional correlation filter network for adaptive visual tracking,” in Proc. IEEE CVPR, 2017, pp. 4828–4837. [ACFN]

[17] Q. Wang, J. Gao, J. Xing, M. Zhang, and W. Hu, “DCFNet: Discriminant correlation filters network for visual tracking,” 2017. [Online]. [DCFNet][DCFNet2]

[18] X. Dong and J. Shen, “Triplet loss in Siamese network for object tracking,” in Proc. ECCV, vol. 11217 LNCS, 2018, pp. 472–488. [TripletLoss-CFNet][TripletLoss-SiamFC][TripletLoss-CFNet2]

[19] G. Bhat, J. Johnander, M. Danelljan, F. S. Khan, and M. Felsberg, “Unveiling the power of deep tracking,” in Proc. ECCV, 2018, pp. 493–509. [UPDT]

[20] Z. Zhu, Q. Wang, B. Li, W. Wu, J. Yan, and W. Hu, “Distractor-aware Siamese networks for visual object tracking,” in Proc. ECCV, vol. 11213 LNCS, 2018, pp. 103–119. [DaSiamRPN]

[21] Y. Zhang, L. Wang, J. Qi, D. Wang, M. Feng, and H. Lu, “Structured Siamese network for real-time visual tracking,” in Proc. ECCV, 2018, pp. 355–370. [StructSiam]

[22] H. Morimitsu, “Multiple context features in Siamese networks for visual object tracking,” in Proc. ECCVW, 2019, pp. 116–131. [Siam-MCF]

[23] J. Choi, H. J. Chang, T. Fischer, S. Yun, K. Lee, J. Jeong, Y. Demiris, and J. Y. Choi, “Context-aware deep feature compression for high-speed visual tracking,” in Proc. IEEE CVPR, 2018, pp. 479–488. [TRACA]

[24] Y. Song, C. Ma, X. Wu, L. Gong, L. Bao, W. Zuo, C. Shen, R. W. Lau, and M. H. Yang, “VITAL: Visual tracking via adversarial learning,” in Proc. IEEE CVPR, 2018, pp. 8990–8999. [VITAL]

[25] F. Li, C. Tian, W. Zuo, L. Zhang, and M. H. Yang, “Learning spatial-temporal regularized correlation filters for visual tracking,” in Proc. IEEE CVPR, 2018, pp. 4904–4913. [DeepSTRCF]

[26] B. Li, J. Yan, W. Wu, Z. Zhu, and X. Hu, “High performance visual tracking with Siamese region proposal network,” in Proc. IEEE CVPR, 2018, pp. 8971–8980. [SiamRPN]

[27] A. He, C. Luo, X. Tian, andW. Zeng, “A twofold Siamese network for real-time object tracking,” in Proc. IEEE CVPR, 2018, pp. 4834–4843. [SA-Siam]

[28] C. Sun, D. Wang, H. Lu, and M. H. Yang, “Learning spatial-aware regressions for visual tracking,” in Proc. IEEE CVPR, 2018, pp. 8962–8970. [LSART]

[29] C. Sun, D. Wang, H. Lu, and M. H. Yang, “Correlation tracking via joint discrimination and reliability learning,” in Proc. IEEE CVPR, 2018, pp. 489–497. [DRT]

[30] S. Pu, Y. Song, C. Ma, H. Zhang, and M. H. Yang, “Deep attentive tracking via reciprocative learning,” in Proc. NIPS, 2018, pp. 1931–1941. [DAT]

[31] C. Ma, J. B. Huang, X. Yang, and M. H. Yang, “Robust visual tracking via hierarchical convolutional features,” IEEE Trans. Pattern Anal. Mach. Intell., 2018. [HCFTs]

[32] C. Ma, J. B. Huang, X. Yang, and M. H. Yang, “Adaptive correlation filters with long-term and short-term memory for object tracking,” IJCV, vol. 126, no. 8, pp. 771–796, 2018. [LCTdeep]

[33] E. Gundogdu and A. A. Alatan, “Good features to correlate for visual tracking,” IEEE Trans. Image Process., vol. 27, no. 5, pp. 2526–2540, 2018. [CFCF]

[34] H. Fan and H. Ling, “Siamese cascaded region proposal networks for real-time visual tracking,” in Proc. IEEE CVPR, 2019. [C-RPN]

[35] J. Gao, T. Zhang, and C. Xu, “Graph convolutional tracking,” in Proc. CVPR, 2019, pp. 4649–4659. [GCT]

[36] Q. Wang, L. Zhang, L. Bertinetto, W. Hu, and P. H. S. Torr, “Fast online object tracking and segmentation: A unifying approach,” in Proc. IEEE CVPR, 2019. [SiamMask]

[37] B. Li, W. Wu, Q. Wang, F. Zhang, J. Xing, and J. Yan, “SiamRPN++: Evolution of Siamese visual tracking with very deep networks,” in Proc. IEEE CVPR, 2019. [SiamRPN++]

[38] X. Li, C. Ma, B. Wu, Z. He, and M.-H. Yang, “Target-aware deep tracking,” in Proc. IEEE CVPR, 2019. [TADT]

[39] K. Dai, D. Wang, H. Lu, C. Sun, and J. Li, “Visual tracking via adaptive spatially-regularized correlation filters,” in Proc. CVPR, 2019, pp. 4670–4679. [ASRCF]

[40] Z. Zhang and H. Peng, “Deeper and wider Siamese networks for real-time visual tracking,” in Proc. IEEE CVPR, 2019. [SiamDW-SiamRPN][SiamDW-SiamFC]

[41] Y. Song, C. Ma, L. Gong, J. Zhang, R. W. Lau, and M. H. Yang, “CREST: Convolutional residual learning for visual tracking,” in Proc. ICCV, 2017, pp. 2574–2583. [CREST][Meta-CREST]