Awesome-Open-Vocabulary-Semantic-Segmentation

A curated publication list on open vocabulary semantic segmentation.

If you find this project helpful, please consider giving it a star ⭐.

Contents

Open-Vocabulary Semantic Segmentation

Fully-Supervised Open-Vocabulary Semantic Segmentation

The model is trained on fully-supervised semantic segmentation datasets with pixel-level annotations (e.g., COCO Stuff dataset).

  1. [LSeg] | ICLR'22 | Language-driven Semantic Segmentation | [pdf] | [code]
  2. [ZegFormer] | CVPR'22 | ZegFormer: Decoupling Zero-Shot Semantic Segmentation | [pdf] | [code]
  3. [OpenSeg] | ECCV'22 | Scaling Open-vocabulary Image Segmentation with Image-level Labels | [pdf] | [code]
  4. [Xu et al.] | ECCV'22 | A Simple Baseline for Open-Vocabulary Semantic Segmentation with Pre-trained Vision-language Model | [pdf] | [code]
  5. [SegCLIP] | ICML'23 | SegCLIP: Patch Aggregation with Learnable Centers for Open-Vocabulary Semantic Segmentation | [pdf] | [code]
  6. [OVSeg] | CVPR'23 | Open-Vocabulary Semantic Segmentation with Mask-adapted CLIP | [pdf] | [code]
  7. [X-Decoder] | CVPR'23 | Generalized Decoding for Pixel, Image, and Language | [pdf] | [code]
  8. [SAN] | CVPR'23(Highlight) | Side Adapter Network for Open-Vocabulary Semantic Segmentation | [pdf] | [code]
  9. [SAN] | TAPMI'23 | SAN: Side Adapter Network for Open-vocabulary Semantic Segmentation | [pdf] | [code]
  10. [ODISE] | CVPR'23 | Open-Vocabulary Panoptic Segmentation with Text-to-Image Diffusion Models | [pdf] | [code]
  11. [FreeSeg] | CVPR'23 | FreeSeg: Unified, Universal and Open-Vocabulary Image Segmentation | [pdf] | [code]
  12. [CAT-Seg] | Arxiv'23 | CAT-Seg : Cost Aggregation for Open-Vocabulary Semantic Segmentation | [pdf] | [code]
  13. [ADA] | Arxiv'23 | Open-Vocabulary Semantic Segmentation via Attribute Decomposition-Aggregation | [pdf]
  14. [OpenSeeD] | ICCV'23 | A Simple Framework for Open-Vocabulary Segmentation and Detection | [pdf] | [code]
  15. [GKC] | ICCV'23 | Global Knowledge Calibration for Fast Open-Vocabulary Segmentation | [pdf]
  16. [OPSNet] | ICCV'23 | Open-vocabulary Panoptic Segmentation with Embedding Modulation | [pdf] | [code]
  17. [MasQCLIP] | ICCV'23 | MasQCLIP for Open-Vocabulary Universal Image Segmentation | [pdf]
  18. [DeOP] | ICCV'23 | Open Vocabulary Semantic Segmentation with Decoupled One-Pass Network | [pdf] | [code]
  19. [HIPIE] | NeurIPS'23 | Hierarchical Open-vocabulary Universal Image Segmentation | [pdf] | [code]
  20. [FC-CLIP] | NeurIPS'23 | Convolutions Die Hard: Open-Vocabulary Segmentation with Single Frozen Convolutional CLIP | [pdf] | [code]

Weakly-Supervised Open-Vocabulary Semantic Segmentation

The model is trained on weakly supervised datasets with only image-level annotations/captions (e.g., CC12M dataset).

  1. [GroupViT] | CVPR'22 | GroupViT: Semantic Segmentation Emerges from Text Supervision | [pdf] | [code]
  2. [ViL-Seg] | ECCV'22 | Open-world Semantic Segmentation via Contrasting and Clustering Vision-Language Embedding | [pdf]
  3. [MaskCLIP+] | ECCV'22(Oral) | Extract Free Dense Labels from CLIP | [pdf] | [code]
  4. [ViewCo] | ICLR'23 | Viewco: Discovering Text-supervised Segmentation Masks via Multi-view semantic Consistency | [pdf]
  5. [SegCLIP] | ICML'23 | SegCLIP: Patch Aggregation with Learnable Centers for Open-Vocabulary Semantic Segmentation | [pdf] | [code]
  6. [CLIP-S4] | CVPR'23 | CLIP-S4: Language-Guided Self-Supervised Semantic Segmentation | [pdf]
  7. [PACL] | CVPR'23 | Open Vocabulary Semantic Segmentation with Patch Aligned Contrastive Learning | [pdf]
  8. [OVSegmentor] | CVPR'23 | Learning Open-vocabulary Semantic Segmentation Models From Natural Language Supervision | [pdf] | [code]
  9. [SimSeg] | CVPR'23 | A Simple Framework for Text-Supervised Semantic Segmentation | [pdf] | [code]
  10. [TCL] | CVPR'23 | Learning to Generate Text-grounded Mask for Open-world Semantic Segmentation from Only Image-Text Pairs | [pdf] | [code]
  11. [ZeroSeg] | CVPR'23 | Exploring Open-Vocabulary Semantic Segmentation from CLIP Vision Encoder Distillation Only | [pdf]
  12. [CLIPpy] | ICCV'23 | Perceptual Grouping in Contrastive Vision-Language Models | [pdf] | [code]
  13. [MixReorg] | ICCV'23 | MixReorg: Cross-Modal Mixed Patch Reorganization is a Good Mask Learner for Open-World Semantic Segmentation | [pdf]
  14. [Zhang et al.] | Arxiv'23 | Associating Spatially-Consistent Grouping with Text-supervised Semantic Segmentation | [pdf]
  15. [SimCon] | Arxiv'23 | SimCon Loss with Multiple Views for Text Supervised Semantic Segmentation | [pdf]
  16. [CoCu] | NeurIPS'23 | Bridging Semantic Gaps for Language-Supervised Semantic Segmentation | [pdf] | [code]

Training-Free Open-Vocabulary Semantic Segmentation

The model is modified from the off-the-shelf large models (e.g., CLIP, Diffusion models) without an additional training phase.

  1. [MaskCLIP] | ECCV'22(Oral) | Extract Free Dense Labels from CLIP | [pdf] | [code]
  2. [CLIP Surgery] | Arxiv'23 | CLIP Surgery for Better Explainability with Enhancement in Open-Vocabulary Tasks | [pdf] | [code]
  3. [OVDiff] | Arxiv'23 | Diffusion Models for Zero-Shot Open-Vocabulary Segmentation | [pdf]
  4. [CLIP-DIY] | Arxiv'23 | CLIP-DIY: CLIP Dense Inference Yields Open-Vocabulary Semantic Segmentation For-Free | [pdf]
  5. [DiffSegmenter] | Arxiv'23 | Diffusion Model is Secretly a Training-free Open Vocabulary Semantic Segmenter | [pdf]

Open-Vocabulary Object Detection

  1. [RO-ViT] | CVPR'23(Highlight) | Region-Aware Pretraining for Open-Vocabulary Object Detection with Vision Transformers | [pdf] | [code]
  2. [CAT] | CVPR'23 | CAT: LoCalization and IdentificAtion Cascade Detection Transformer for Open-World Object Detection | [pdf] | [code]
  3. [DetCLIPv2] | CVPR'23 | DetCLIPv2: Scalable Open-Vocabulary Object Detection Pre-training via Word-Region Alignment | [pdf]
  4. [CondHead] | CVPR'23 | Learning to Detect and Segment for Open Vocabulary Object Detection | [pdf]
  5. [CORA] | CVPR'23 | CORA: Adapting CLIP for Open-Vocabulary Detection with Region Prompting and Anchor Pre-Matching | [pdf] | [code]
  6. [ovdet] | CVPR'23 | Aligning Bag of Regions for Open-Vocabulary Object Detection | [pdf] | [code]
  7. [OADP] | CVPR'23 | Object-Aware Distillation Pyramid for Open-Vocabulary Object Detection | [pdf] | [code]
  8. [F-VLM] | ICLR'23 | F-VLM: Open-Vocabulary Object Detection upon Frozen Vision and Language Models | [pdf] | [code]
  9. [MMC-Det] | Arxiv'23 | Exploring Multi-Modal Contextual Knowledge for Open-Vocabulary Object Detection | [pdf]
  10. [IPL] | Arxiv'23 | Improving Pseudo Labels for Open-Vocabulary Object Detection | [pdf]
  11. [mm-ovod] | ICML 2023 | Multi-Modal Classifiers for Open-Vocabulary Object Detection | [pdf] | [code]
  12. [EdaDet] | ICCV'23 | EdaDet: Open-Vocabulary Object Detection Using Early Dense Alignment | [pdf] | [code]
  13. [SGDN] | Arxiv'23 | Open-Vocabulary Object Detection via Scene Graph Discovery | [pdf]

Related Survey

  1. Towards Open Vocabulary Learning: A Survey | [pdf]
  2. A Survey on Open-Vocabulary Detection and Segmentation: Past, Present, and Future | [pdf]

Feedback

If you have any suggestions or find missing papers, please don't hesitate to contact me.