Video Segmentation with Natural Language Query

Datasets, Papers and Codes of Video Segmentation with Natural Language Query

Toolkits

A2D Sentences: Gavrilyuk, K., et al. Actor and action video segmentation from a sentence Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2018. [Paper] [Project]
J-HMDB Sentences: Gavrilyuk, K., et al. Actor and action video segmentation from a sentence Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2018. [Paper] [Project]

Refer-DAVIS: Anna, K., et al. Video object segmentation with referring expressions
Proceedings of the 14th Asian Conference on Computer Vision (ACCV). 2018. [Paper] [Project]
Refer-Youtube-VOS: Seo, S., et al. URVOS: Unified Referring Video Object Segmentation Network with a Large-Scale Benchmark Proceedings of the European Conference on Computer Vision (ECCV). 2020. [Paper] [Project]

Hu, Ronghang, et al. Segmentation from natural language expressions ECCV, 2016. [Paper] [Code]
Li, Zhengyang, et al. Tracking by natural language specification CVPR, 2017. [Paper] [Code]
Gavrilyuk, K., et al. Actor and action video segmentation from a sentence CVPR, 2018. [Paper]
Wang, Hao, et al. Asymmetric cross-guided attention network for actor and action video segmentation from natural language query ICCV, 2019. [Paper] [Project] [Code]
Wang, Hao, et al. Context modulated dynamic networks for actor and action video segmentation with language queries AAAI, 2020. [Paper] [Project] [Code]
McIntosh, B., et al. Visual-Textual Capsule Routing for Text-Based Video Segmentation CVPR, 2020. [Paper]
Ning, Ke, et al. Polar relative positional encoding for video-language segmentation IJCAI, 2020. [Paper]
Bellver, M., et al. RefVOS: A Closer Look at Referring Expressions for Video Object Segmentation arXiv, 2010.00263. [Paper]
Yang, Jianhua, et al. Actor and Action Modular Network for Text-based Video Segmentation arXiv, 2011.00786. [Paper]

Anna, K., et al. Video object segmentation with referring expressions ACCV. 2018. [Paper] [Project]
Seo, S., et al. URVOS: Unified Referring Video Object Segmentation Network with a Large-Scale Benchmark ECCV. 2020. [Paper] [Project]
Bellver, M., et al. RefVOS: A Closer Look at Referring Expressions for Video Object Segmentation arXiv, 2010.00263. [Paper]