/Awesome-Referring-Image-Segmentation

:books: A collection of papers about Referring Image Segmentation.

Awesome-Referring-Image-Segmentation

Awesome

A collection of referring image segmentation papers and datasets.

Feel free to create a PR or an issue.

examples

Outline

1. Datasets

Short name Paper Source Code/Project Link
MeViS MeViS: A Large-scale Benchmark for Video Segmentation with Motion Expressions ICCV 2023 [dataset] [project]
gRefCOCO GRES: Generalized Referring Expression Segmentation CVPR 2023 [dataset] [project]
ClevrTex ClevrTex: A Texture-Rich Benchmark for Unsupervised Multi-Object Segmentation NeurIPS Datasets and Benchmarks 2021 [project]
ScanRefer ScanRefer: 3D Object Localization in RGB-D Scans using Natural Language ECCV 2020 [project]
VGPhraseCut PhraseCut: Language-based Image Segmentation in the Wild CVPR 2020 [project]
CLEVR-Ref+ CLEVR-Ref+: Diagnosing Visual Reasoning with Referring Expressions CVPR 2019 [project]
UNC Modeling context in referring expressions ECCV 2016 [dataset]
UNC+ Modeling context in referring expressions ECCV 2016 [dataset]
Google-Ref Generation and comprehension of unambiguous object descriptions CVPR 2016 [dataset]
ReferIt Referit game: Referring to objects in photographs of natural scenes EMNLP 2014 [project]

2. Challenges

Name Workshop Date Submission Link
1st MeViS Challenge CVPR 2024 Workshop: Pixel-level Video Understanding in the Wild May 2024 [CodaLab]
RVOS Challenge ECCV 2024 Workshop: The 6th Large-scale Video Object Segmentation Challenge Aug 2024 [CodaLab]

3. Traditional Referring Image Segmentation

Short name Paper Source Code/Project Link
Shared-RIS A Simple Baseline with Single-encoder for Referring Image Segmentation arxiv 24.08 [code]
ASDA Adaptive Selection based Referring Image Segmentation ACM MM 2024 code
NeMo Finding NeMo: Negative-mined Mosaic Augmentation for Referring Image Segmentation ECCV 2024 [webpage] [code]
ReMamber ReMamber: Referring Image Segmentation with Mamba Twister ECCV 2024 [code]
GTMS GTMS: A Gradient-driven Tree-guided Mask-free Referring Image Segmentation Method ECCV 2024 [code]
SAM4MLLM SAM4MLLM: Enhance Multi-Modal Large Language Model for Referring Expression Segmentation ECCV 2024 [code]
Pseudo-RIS Pseudo-RIS: Distinctive Pseudo-supervision Generation for Referring Image Segmentation ECCV 2024 [code]
SafaRi SafaRi: Adaptive Sequence Transformer for Weakly Supervised Referring Expression Segmentation ECCV 2024 [webpage]
CM-MaskSD CM-MaskSD: Cross-Modality Masked Self-Distillation for Referring Image Segmentation TMM 2024
Prompt-RIS Prompt-Driven Referring Image Segmentation with Instance Contrasting CVPR 2024
LQMFormer LQMFormer: Language-aware Query Mask Transformer for Referring Image Segmentation CVPR 2024
PPT Curriculum Point Prompting for Weakly-Supervised Referring Image Segmentation CVPR 2024
GSVA GSVA: Generalized Segmentation via Multimodal Large Language Models CVPR 2024 [code]
RMSIN Rotated Multi-Scale Interaction Network for Referring Remote Sensing Image Segmentation CVPR 2024 [code]
MRES Unveiling Parts Beyond Objects: Towards Finer-Granularity Referring Expression Segmentation CVPR 2024 [code] [webpage]
MagNet Mask Grounding for Referring Image Segmentation CVPR 2024 [webpage]
LISA LISA: Reasoning Segmentation via Large Language Model CVPR 2024 [code]
RefSegformer Towards Robust Referring Image Segmentation TIP 2024 [code]
JMCELN Referring Image Segmentation via Joint Mask Contextual Embedding Learning and Progressive Alignment Network EMNLP 2023 [code]
CVMN Unsupervised Domain Adaptation for Referring Semantic Segmentation ACM MM 2023 [code]
CARIS CARIS: Context-Aware Referring Image Segmentation ACM MM 2023 [code]
TAS Text Augmented Spatial-aware Zero-shot Referring Image Segmentation EMNLP 2023
BKINet Bilateral Knowledge Interaction Network for Referring Image Segmentation TMM 2023 [code]
Group-RES Advancing Referring Expression Segmentation Beyond Single Image ICCV 2023 [code]
Weakly Supervised Referring Image Segmentation with Intra-Chunk and Inter-Chunk Consistency ICCV 2023
Shatter and Gather: Learning Referring Image Segmentation with Text Supervision ICCV 2023
TRIS Referring Image Segmentation Using Text Supervision ICCV 2023 [code]
RIS-DMMI Beyond One-to-One: Rethinking the Referring Image Segmentation ICCV 2023 [code]
ETRIS Bridging Vision and Language Encoders: Parameter-Efficient Tuning for Referring Image Segmentation ICCV 2023 [code]
SEEM Segment Everything Everywhere All at Once arXiv 23.04 [code]
SLViT SLViT: Scale-Wise Language-Guided Vision Transformer for Referring Image Segmentation IJCAI 2023 [code]
WiCo WiCo: Win-win Cooperation of Bottom-up and Top-down Referring Image Segmentation IJCAI 2023
M3Att Multi-Modal Mutual Attention and Iterative Interaction for Referring Image Segmentation TIP 2023
X-Decoder X-Decoder: Generalized Decoding for Pixel, Image and Language CVPR 2023 [code] [project]
Partial-RES Learning to Segment Every Referring Object Point by Point CVPR 2023 [code]
MCRES Meta Compositional Referring Expression Segmentation CVPR 2023
Global-Local CLIP Zero-shot Referring Image Segmentation with Global-Local Context Features CVPR 2023 [code]
PolyFormer PolyFormer: Referring Image Segmentation as Sequential Polygon Generation CVPR 2023 [code] [project]
GRES GRES: Generalized Referring Expression Segmentation CVPR 2023 [code] [dataset] [project]
CGFormer Contrastive Grouping with Transformer for Referring Image Segmentation CVPR 2023 [code]
SADLR Semantics-Aware Dynamic Localization and Refinement for Referring Image Segmentation AAAI 2023
R-RIS Towards Robust Referring Image Segmentation arXiv 22.09 [code] [project]
- Learning From Box Annotations for Referring Image Segmentation TNNLS 2022 [code]
- Instance-Specific Feature Propagation for Referring Segmentation TMM 2022
LAVT LAVT: Language-Aware Vision Transformer for Referring Image Segmentation CVPR 2022 [code]
CRIS CRIS: CLIP-Driven Referring Image Segmentation CVPR 2022 [code]
ReSTR ReSTR: Convolution-free Referring Image Segmentation Using Transformers CVPR 2022 [project]
TV-Net Two-stage Visual Cues Enhancement Network for Referring Image Segmentation ACM MM 2021 [code]
VLT Vision-Language Transformer and Query Generation for Referring Segmentation ICCV 2021 [code]
MDETR MDETR - Modulated Detection for End-to-End Multi-Modal Understanding ICCV 2021 [code] [project]
CEFNet Encoder Fusion Network with Co-Attention Embedding for Referring Image Segmentation CVPR 2021 [code]
BUSNet Bottom-Up Shift and Reasoning for Referring Image Segmentation CVPR 2021 [code]
LTS Locate then Segment: A Strong Pipeline for Referring Image Segmentation CVPR 2021
CGAN Cascade Grouped Attention Network for Referring Expression Segmentation ACM MM 2020
LSCM Linguistic Structure Guided Context Modeling for Referring Image Segmentation ECCV 2020 [code]
CMPC-Refseg Referring Image Segmentation via Cross-Modal Progressive Comprehension CVPR 2020 [code]
BRINet Bi-directional Relationship Inferring Network for Referring Image Segmentation CVPR 2020 [code]
PhraseCut PhraseCut: Language-based Image Segmentation in the Wild CVPR 2020 [code] [project]
MCN Multi-task Collaborative Network for Joint Referring Expression Comprehension and Segmentation CVPR 2020 [code]
- Dual Convolutional LSTM Network for Referring Image Segmentation TMM 2020
STEP See-Through-Text Grouping for Referring Image Segmentation ICCV 2019
lang2seg Referring Expression Object Segmentation with Caption-Aware Consistency BMVC 2019 [code]
CMSA Cross-Modal Self-Attention Network for Referring Image Segmentation CVPR 2019 [code]
KWA Key-Word-Aware Network for Referring Expression Image Segmentation ECCV 2018 [code]
DMN Dynamic Multimodal Instance Segmentation Guided by Natural Language Queries ECCV 2018 [code]
RRN Referring Image Segmentation via Recurrent Refinement Networks CVPR 2018 [code]
MAttNet MAttNet: Modular Attention Network for Referring Expression Comprehension CVPR 2018 [code] [Demo]
RMI Recurrent Multimodal Interaction for Referring Image Segmentation ICCV 2017 [code]
LSTM-CNN Segmentation from natural language expressions ECCV 2016 [code] [project]

4. Interactive Referring Image Segmentation

Short name Paper Source Code/Project Link
PhraseClick PhraseClick: Toward Achieving Flexible Interactive Segmentation by Phrase and Click ECCV 2020

5. Referring Video Object Segmentation

Short name Paper Source Code/Project Link
VD-IT Exploring Pre-trained Text-to-Video Diffusion Models for Referring Video Object Segmentation ECCV 2024 [code]
DsHmp Decoupling Static and Hierarchical Motion Perception for Referring Video Segmentation CVPR 2024 [code]
LoSh LoSh: Long-Short Text Joint Prediction Network for Referring Video Object Segmentation CVPR 2024 [code]
SOC SOC: Semantic-Assisted Object Cluster for Referring Video Object Segmentation NeurIPS 2023 [code]
Locater Local-Global Context Aware Transformer for Language-Guided Video Segmentation TPAMI 2023 [code] [dataset]
TempCD Temporal Collection and Distribution for Referring Video Object Segmentation ICCV 2023 [project] [code]
HTML HTML: Hybrid Temporal-scale Multimodal Learning Framework for Referring Video Object Segmentation ICCV 2023 [project]
LMPM MeViS: A Large-scale Benchmark for Video Segmentation with Motion Expressions ICCV 2023 [code] [project]
OnlineRefer OnlineRefer: A Simple Online Baseline for Referring Video Object Segmentation ICCV 2023 [code]
SgMg Spectrum-guided Multi-granularity Referring Video Object Segmentation ICCV 2023 [code]
R2VOS Towards Robust Referring Video Object Segmentation with Cyclic Relational Consistency ICCV 2023 [code]
MANet Multi-Attention Network for Compressed Video Referring Object Segmentation ACM MM 2022 [code]
MTTR End-to-End Referring Video Object Segmentation with Multimodal Transformers CVPR 2022 [code]
ReferFormer Language as Queries for Referring Video Object Segmentation CVPR 2022 [code]
LBDT Language-Bridged Spatial-Temporal Interaction for Referring Video Object Segmentation CVPR 2022 [code]
- Multi-Level Representation Learning with Semantic Alignment for Referring Video Object Segmentation CVPR 2022
YOFO You Only Infer Once: Cross-Modal Meta-Transfer for Referring Video Object Segmentation AAAI 2022
CITD Rethinking Cross-modal Interaction from a Top-down Perspective for Referring Video Object Segmentation CVPRW 2021
ClawCraneNet ClawCraneNet: Leveraging Object-level Relation for Text-based Video Segmentation arXiv 21.03
RefVOS RefVOS: A Closer Look at Referring Expressions for Video Object Segmentation arXiv 20.10
URVOS URVOS: Unified Referring Video Object Segmentation Network with a Large-Scale Benchmark ECCV 2020 [code]
Video Object Segmentation with Language Referring Expressions ACCV 2018

6. 3D Referring Segmentation

Short name Paper Source Code/Project Link
X-RefSeg3D X-RefSeg3D: Enhancing Referring 3D Instance Segmentation via Structured Cross-Modal Graph Neural Networks AAAI 2024 [code]
3D-STMN 3D-STMN: Dependency-Driven Superpoint-Text Matching Network for End-to-End 3D Referring Expression Segmentation AAAI 2024 [code]
SegPoint SegPoint: Segment Any Point Cloud via Large Language Model ECCV 2024 [project]
3D-GRES 3D-GRES: Generalized 3D Referring Expression Segmentation ACM MM 2024 [code]
RefMask3D RefMask3D: Language-Guided Transformer for 3D Referring Segmentation ACM MM 2024 [code]
TGNN Text-Guided Graph Neural Networks for Referring 3D Instance Segmentation AAAI 2021
InstanceRefer InstanceRefer: Cooperative Holistic Understanding for Visual Grounding on Point Clouds through Instance Multi-level Contextual Referring ICCV 2021 [code]