Vision-Language_Tracking_Paper_List

Paper list for vision-language tracking (continue to Update this list)

Datasets:

  • Towards More Flexible and Accurate Object Tracking with Natural Language: Algorithms and Benchmark, Xiao Wang, Xiujun Shu, Zhipeng Zhang, Bo Jiang, Yaowei Wang, Yonghong Tian, Feng Wu (CVPR21)
    [Paper] [Evaluation Toolkit & Github] [Project]

  • Tracking by Natural Language Specification, Zhenyang Li, Ran Tao, Efstratios Gavves, Cees G. M. Snoek, Arnold W.M. Smeulders (CVPR17)
    [Paper] [Github]

  • LaSOT: A High-quality Benchmark for Large-scale Single Object Tracking, Heng Fan, Liting Lin, Fan Yang, Peng Chu, Ge Deng, Sijia Yu, Hexin Bai, Yong Xu, Chunyuan Liao, Haibin Ling (CVPR19)
    [Paper] [Github] [Project]

Initialize Settings:

NL+BBox:

  • SNLT: Siamese Natural Language Tracker: Tracking by Natural Language Descriptions with Siamese Trackers, Feng Qi, Vitaly Ablavsky, Qinxun Bai, Stan Sclaroff (CVPR21)
    [Paper] [Code]

  • Divert More Attention to Vision-Language Tracking, Mingzhe Guo, Zhipeng Zhang, Heng Fan, Liping Jing (NeurIPS22)
    [Paper] [Code]

NL & NL+BBox:

  • Tracking by Natural Language Specification, Zhenyang Li, Ran Tao, Efstratios Gavves, Cees G. M. Snoek, Arnold W.M. Smeulders (CVPR17)
    [Paper] [Github]

  • Grounding-Tracking-Integration, Zhengyuan Yang, Tushar Kumar, Tianlang Chen, Jingsong Su, and Jiebo Luo (TCSVT20)
    [Paper]

  • Real-time Visual Object Tracking with Natural Language Description, Qi Feng, Vitaly Ablavsky, Qinxun Bai, Guorong Li, and Stan Sclaroff (WACV20)
    [Paper]

  • Capsule-based Object Tracking with Natural Language Specification, Ding Ma, Xiangqian Wu (MM21)
    [Paper]

  • Towards More Flexible and Accurate Object Tracking with Natural Language: Algorithms and Benchmark, Xiao Wang, Xiujun Shu, Zhipeng Zhang, Bo Jiang, Yaowei Wang, Yonghong Tian, Feng Wu (CVPR21)
    [Paper] [Evaluation Toolkit & Github] [Project]

  • Cross-modal Target Retrieval for Tracking by Natural Language, Yihao Li, Jun Yu, Zhongpeng Cai, Yuwen Pan (CVPRW22)
    [Paper]

  • Joint Visual Grounding and Tracking with Natural Language Specification, Li Zhou, Zikun Zhou, Kaige Mao, and Zhenyu He (CVPR23)
    [Paper] [Github]