/awesome-rec

A curated list of research papers in Referring Expression Comprehension (REC)

MIT LicenseMIT

Awesome Referring Expression Comprehension

Inspired by awesome-grounding and this survey.

A curated list of research papers in Referring Expression Comprehension (REC). Link to the code and website if available is also present.

Table of Contents

Paper List

Survey

  • Referring Expression Comprehension : A Survey of Methods and Datasets. Yanyuan Qiao, Chaorui Deng, and Qi Wu. arXiv, 2020. [Paper]

Dataset

  • [RefCOCOg] Generation and Comprehension of Unambiguous Object Descriptions. Junhua Mao, Jonathan Huang, Alexander Toshev, Oana Camburu, Alan Yuille, and Kevin Murphy. CVPR, 2016. [Paper] [Code]
  • [RefCOCO, RefCOCO+] Modeling context in referring expressions. Licheng Yu, Patrick Poirson, Shan Yang, Alexander C. Berg, and Tamara L. Berg. ECCV, 2016. [Paper] [Code]
  • [CLEVR-Ref+] CLEVR-Ref+: Diagnosing Visual Reasoning with Referring Expressions. Runtao Liu, Chenxi Liu, Yutong Bai, and Alan Yuille. CVPR, 2019. [Paper] [Code] [Website]
  • [Cops-Ref] Cops-Ref: A new Dataset and Task on Compositional Referring Expression Comprehension. Zhenfang Chen, Peng Wang, Lin Ma, Kwan-Yee K. Wong, and Qi Wu. CVPR, 2020. [Paper][Code]
  • [Ref-Reasoning] Graph-Structured Referring Expression Reasoning in The Wild. Sibei Yang, Guanbin Li, and Yizhou Yu. CVPR, 2020. [Paper] [Code] [Website]

arXiv

  • (TransVG) TransVG: End-to-End Visual Grounding with Transformers. Jiajun Deng, Zhengyuan Yang, Tianlang Chen, Wengang Zhou, Houqiang Li. arXiv, 2021. [Paper]
  • (ECIFA) Give Me Something to Eat: Referring Expression Comprehension with Commonsense Knowledge. Peng Wang, Dongyang Liu, Hui Li, and Qi Wu. arXiv, 2020. [Paper]
  • (JVGN) Joint Visual Grounding with Language Scene Graphs. Daqing Liu, Hanwang Zhang, Zheng-Jun Zha, Meng Wang, and Qianru Sun. arXiv, 2019. [Paper] (I am an author of the paper)
  • A Real-time Global Inference Network for One-stage Referring Expression Comprehension. Yiyi Zhou et al. arXiv, 2019. [Paper] [Code]
  • (SGG) Real-Time Referring Expression Comprehension by Single-Stage Grounding Network. Xinpeng Chen, Lin Ma, Jingyuan Chen, Zequn Jie, Wei Liu, and Jiebo Luo. arXiv, 2018. [Paper]

2020

  • Improving One-stage Visual Grounding by Recursive Sub-query Construction. Zhengyuan Yang, Tianlang Chen, Liwei Wang, and Jiebo Luo. ECCV, 2020. [Paper] [Code]
  • (LSCM) Linguistic Structure Guided Context Modeling for Referring Image Segmentation. Tianrui Hui et al. ECCV, 2020. [Paper]
  • (BiLingUNet) BiLingUNet: Image Segmentation by Modulating Top-Down and Bottom-Up Visual Processing with Referring Expressions. Ozan Arkan Can, İlker Kesen, and Deniz Yuret. ECCV, 2020. [Paper]
  • (SGMN) Graph-Structured Referring Expression Reasoning in The Wild. Sibei Yang, Guanbin Li, and Yizhou Yu. CVPR, 2020. [Paper] [Code] [Website]
  • (MCN) Multi-task Collaborative Network for Joint Referring Expression Comprehension and Segmentation. Gen Luo et al. CVPR, 2020. [Paper] [Code]
  • (RCCF) A Real-Time Cross-modality Correlation Filtering Method for Referring Expression Comprehension. Yue Liao et al. CVPR, 2020. [Paper]
  • (LCMCG) Learning Cross-modal Context Graph for Visual Grounding. Yongfei Liu, Bo Wan, Xiaodan Zhu, and Xuming He. AAAI, 2020. [Code]

2019

  • (NMTree) Learning to Assemble Neural Module Tree Networks for Visual Grounding. Daqing Liu, Hanwang Zhang, Feng Wu, and Zheng-Jun Zha. ICCV, 2019. [Paper] [Code] (I am an author of the paper)
  • (RvG-Tree) Learning to Compose and Reason with Language Tree Structures for Visual Grounding. Richang Hong, Daqing Liu, Xiaoyu Mo, Xiangnan He, and Hanwang Zhang. TPAMI, 2019. [Paper] (I am an author of the paper)
  • (FAOA) A Fast and Accurate One-Stage Approach to Visual Grounding. Zhengyuan Yang, Boqing Gong, Liwei Wang, Wenbing Huang, Dong Yu, and Jiebo Luo. ICCV, 2019. [Paper] [Code]
  • (DGA) Dynamic Graph Attention for Referring Expression Comprehension. Sibei Yang, Li Guanbin, and Yu Yizhou. ICCV, 2019. [Paper] [Code]
  • (LCGN) Language-Conditioned Graph Networks for Relational Reasoning. Ronghang Hu, Anna Rohrbach, Trevor Darrell, and Kate Saenko. ICCV, 2019. [Paper] [Code]
  • See-through-text grouping for referring image segmentation. DIng Jie Chen, Songhao Jia, Yi Chen Lo, Hwann Tzong Chen, and Tyng Luh Liu. ICCV, 2019. [Paper]
  • (CMRIN) Cross-Modal Relationship Inference for Grounding Referring Expressions. Sibei Yang, Guanbin Li, and Yizhou Yu. CVPR, 2019. [Paper]
  • (CM-Att-Erase) Improving Referring Expression Grounding with Cross-modal Attention-guided Erasing. Xihui Liu, Zihao Wang, Jing Shao, Xiaogang Wang, and Hongsheng Li. CVPR, 2019. [Paper]
  • (CMSA) Cross-Modal Self-Attention Network for Referring Image Segmentation. Linwei Ye, Mrigank Rochan, Zhi Liu, and Yang Wang. CVPR, 2019. [Paper] [Code]

2018

  • (Multi-hop FiLM) Visual Reasoning with Multi-hop Feature Modulation. Florian Strub, Mathieu Seurin, Ethan Perez, and Harm De Vries. ECCV, 2018. [Paper]
  • (DDPN) Rethinking diversified and discriminative proposal generation for visual grounding. Zhou Yu, Jun Yu, Chenchao Xiang, Zhou Zhao, Qi Tian, and Dacheng Tao. IJCAI, 2018. [Paper] [Code]
  • (MAttNet) MAttNet: Modular Attention Network for Referring Expression Comprehension. Licheng Yu *et al.* CVPR, 2018. [Paper] [Code] [Website]
  • (AccumAttn) Visual Grounding via Accumulated Attention. Chaorui Deng, Qi Wu, Qingyao Wu, Fuyuan Hu, Fan Lyu, and Mingkui Tan. CVPR, 2018. [Paper]
  • (ParalAttn) Parallel Attention: A Unified Framework for Visual Object Discovery Through Dialogs and Queries. Bohan Zhuang, Qi Wu, Chunhua Shen, Ian Reid, and Anton Van Den Hengel. CVPR, 2018. [Paper] [Code]
  • (LGRAN) Neighbourhood Watch: Referring Expression Comprehension via Language-guided Graph Attention Networks. Peng Wang, Qi Wu, Jiewei Cao, Chunhua Shen, Lianli Gao, and Anton van den Hengel. CVPR, 2018. [Paper]
  • (VariContext) Grounding Referring Expressions in Images by Variational Context. Hanwang Zhang, Yulei Niu, and Shih-Fu Chang. CVPR, 2018. [Paper] [Code]
  • (GroundNet) Using Syntax to Ground Referring Expressions in Natural Images. Volkan Cirik, Taylor Berg-Kirkpatrick, and Louis-Philippe Morency. AAAI, 2018. [Paper] [Code]

2017

  • Recurrent Multimodal Interaction for Referring Image Segmentation. Chenxi Liu, Zhe Lin, Xiaohui Shen, Jimei Yang, Xin Lu, and Alan Yuille. ICCV, 2017. [Paper] [Code]
  • (Attribute) Referring Expression Generation and Comprehension via Attributes. Jingyu Liu, Liang Wang, and Ming-Hsuan Yang. ICCV, 2017. [Paper]
  • (CMN) Modeling relationships in referential expressions with compositional modular networks. Ronghang Hu, Marcus Rohrbach, Jacob Andreas, Trevor Darrell, and Kate Saenko. CVPR, 2017. [Paper] [Code]
  • (Spe+Lis+RI) A Joint Speaker-Listener-Reinforcer Model for Referring Expressions. Licheng Yu, Hao Tan, Mohit Bansal, and Tamara L. Berg. CVPR, 2017. [Paper] [Code] [Website]
  • Comprehension-guided referring expressions. Ruotian Luo and Gregory Shakhnarovich. CVPR, 2017. [Paper] [Code]

2016

  • (MCB) Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding. Akira Fukui, Dong Huk Park, Daylen Yang, Anna Rohrbach, Trevor Darrell, and Marcus Rohrbach. EMNLP, 2016. [Paper] [Code]

  • (NegBag) Modeling context between objects for referring expression understanding. Varun K. Nagaraja, Vlad I. Morariu, and Larry S. Davis. ECCV, 2016. [Paper] [Code]

  • (VisDif) Modeling context in referring expressions. Licheng Yu, Patrick Poirson, Shan Yang, Alexander C. Berg, and Tamara L. Berg. ECCV, 2016. [Paper] [Code]

  • (SCRC) Natural Language Object Retrieval. Ronghang Hu, Huazhe Xu, Marcus Rohrbach, Jiashi Feng, Kate Saenko, and Trevor Darrell. CVPR, 2016. [Paper] [Code] [Website]

  • (MMI) Generation and Comprehension of Unambiguous Object Descriptions. Junhua Mao, Jonathan Huang, Alexander Toshev, Oana Camburu, Alan Yuille, and Kevin Murphy. CVPR, 2016. [Paper] [Code]

Contributing

Please feel free to contact me via email (liudq@mail.ustc.edu.cn) or open an issue or submit a pull request.

To add a new paper via pull request:

  1. Fork the repo, edit README.md.
  2. Put the new paper at the correct chronological position as the following format:
    - **Paper Title**. *Author(s)*. Conference, Year. [[Paper]](link) [[Code]](link) [[Website]](link)
    
  3. Send a pull request. Ideally, I will review the request within a week.

Acknowledgement

This repo is maintained by Daqing LIU.

Other Awesome Vision-Language lists: Awesome Vision-Languge Navigation, Awesome-Video-Captioning.