Awesome-CIR

Collections for the Composed Image Retrieval (CIR), including:
1. Attribute-based CIR
2. Supervised CIR
3. Few-shot CIR
4. Zero-shot CIR
5. Semi-supervised CIR
6. Conversational CIR
7. Composed Video Retrieval (COVR)
8. Sketch-based CIR
9. Others

1. Attribute-based CIR

2021

  • [1] [ICCV'21] | Learning Attribute-driven Disentangled Representations for Interactive Fashion Retrieval. [Paper]
  • [2] [ICCV'21] | Face Image Retrieval with Attribute Manipulation. [Paper]

2020

  • [1] [SIGIR'20] | Generative Attribute Manipulation Scheme for Flexible Fashion Search. [Paper]

2018

  • [1] [CVPR'18] | Learning Attribute Representations with Localization for Flexible Fashion Search. [Paper]
  • [2] [WACV'18] | Efficient Multi-Attribute Similarity Learning Towards Attribute-based Fashion Search. [Paper]

2017

  • [1] [CVPR'17] | Memory-Augmented Attribute Manipulation Networks for Interactive Fashion Search. [Paper]

2. Supervised CIR

Pre-prints

  • [1] [Arxiv'24] | VISTA: Visualized Text Embedding For Universal Multi-Modal Retrieval. [Paper]
  • [2] [Arxiv'24] | Improving Composed Image Retrieval via Contrastive Learning with Scaling Positives and Negatives. [Paper]
  • [3] [Arxiv'23] | Relieving Triplet Ambiguity: Consensus Network for Language-Guided Image Retrieval. [Paper]
  • [4] [Arxiv'23] | Ranking-aware Uncertainty for Text-guided Image Retrieval. [Paper]
  • [5] [Arxiv'23] | Learning with Multi-modal Gradient Attention for Explainable Composed Image Retrieval. [Paper]
  • [6] [Arxiv'23] | VQA4CIR: Boosting Composed Image Retrieval with Visual Question Answering. [Paper]

2024

  • [1] [WACV'24] | Bi-directional Training for Composed Image Retrieval via Text Prompt Learning. [Paper]
  • [2] [TOMM'24] | Cross-Modal Attention Preservation with Self-Contrastive Learning for Composed Query-Based Image Retrieval. [Paper]
  • [3] [TOMM'24] | SPIRIT: Style-guided Patch Interaction for Fashion Image Retrieval with Text Feedback. [Paper]
  • [4] [TPAMI'24] | Self-Training Boosted Multi-Factor Matching Network for Composed Image Retrieval. [Paper]
  • [5] [AAAI'24] | Dynamic Weighted Combiner for Mixed-Modal Image Retrieval. [Paper]
  • [6] [AAAI'24] | Data Roaming and Quality Assessment for Composed Image Retrieval. [Paper]
  • [7] [AAAI'24] | FashionERN Enhance-and-Refine Network for Composed Fashion Image Retrieval. [Paper]
  • [8] [AAAI'24] | Decomposing Semantic Shifts for Composed Image Retrieval. [Paper]
  • [9] [SIGIR'24] | Simple but Effective Raw-Data Level Multimodal Fusion for Composed Image Retrieval. [Paper]
  • [10] [SIGIR'24] | CaLa: Complementary Association Learning for Augmenting Composed Image Retrieval. [Paper]
  • [11] [CVPR'24] | SyncMask: Synchronized Attentional Masking for Fashion-centric Vision-Language Pretraining. [Paper]
  • [12] [ICLR'24] | Sentence-level Prompts Benefit Composed Image Retrieval. [Paper]
  • [13] [ICLR'24] | Composed Image Retrieval with Text Feedback via Multi-Grained Uncertainty Regularization. [Paper]
  • [14] [TMLR'24] | Candidate Set Re-ranking for Composed Image Retrieval with Dual Multimodal Encoder. [Paper]
  • [15] [TCSVT'24] | Set of Diverse Queries with Uncertainty Regularization for Composed Image Retrieval. [Paper]
  • [16] [ICMR'24] | CLIP-ProbCR:CLIP-based Probability embedding Combination Retrieval. [Paper]
  • [17] [TMM'24] | Align and Retrieve: Composition and Decomposition Learning in Image Retrieval with Text Feedback. [Paper]
  • [18] [KBS'24] | Collaborative Group: Composed Image Retrieval via Consensus Learning From Noisy Annotations. [Paper]
  • [19] [TIP'24] | Multimodal Composition Example Mining for Composed Query Image Retrieval. [Paper]
  • [20] [TOIS'24] | LLM-enhanced Composed Image Retrieval: An Intent Uncertainty-aware Linguistic-Visual Dual Channel Matching Model. [Paper]

2023

  • [1] [TOMM'23] | AMC: Adaptive Multi-expert Collaborative Network for Text-guided Image Retrieval. [Paper]
  • [2] [TMM'23] | Multi-Modal Transformer With Global-Local Alignment for Composed Query Image Retrieval. [Paper]
  • [3] [WACV'23] | Fashion Image Retrieval with Text Feedback by Additive Attention Compositional Learning. [Paper]
  • [4] [ICMR'23] | Dual-Path Semantic Construction Network for Composed Query-Based Image Retrieval. [Paper]
  • [5] [TCSVT'23] | Multi-Grained Attention Network With Mutual Exclusion for Composed Query-Based Image Retrieval. [Paper]
  • [6] [TOMM'23] | Composed Image Retrieval using Contrastive Learning and Task-oriented CLIP-based Features. [Paper]
  • [7] [ICME'23] | Visual-Linguistic Alignment and Composition for Image Retrieval with Text Feedback. [Paper]
  • [8] [TIP'23] | Composed Image Retrieval via Cross Relation Network With Hierarchical Aggregation Transformer. [Paper]
  • [9] [MM'23] | Target-Guided Composed Image Retrieval. [Paper]
  • [10] [CVPR'23] | FAME-ViL: Multi-Tasking Vision-Language Model for Heterogeneous Fashion Tasks. [Paper]
  • [11] [ICCVW'23] | ProVLA: Compositional Image Search with Progressive Vision-Language Alignment and Multimodal Fusion. [Paper]
  • [12] [NeurIPSW'23] | NEUCORE: Neural Concept Reasoning for Composed Image Retrieval. [Paper]
  • [13] [NeurIPSW'23] | Benchmarking Robustness of Text-Image Composed Retrieval. [Paper]
  • [13] [MMW'23] | Fashion-GPT: Integrating LLMs with Fashion Retrieval System. [Paper]

2022

  • [1] [ICLR'22] | ARTEMIS: Attention-based Retrieval with Text-Explicit Matching and Implicit Similarity. [Paper][Arxiv]
  • [2] [TOMM'22] | Tell, Imagine, and Search: End-to-end Learning for Composing Text and Image to Image Retrieval. [Paper]
  • [3] [TIP'22] | Geometry Sensitive Cross-Modal Reasoning for Composed Query Based Image Retrieval. [Paper]
  • [4] [TIP'22] | Composed Image Retrieval via Explicit Erasure and Replenishment With Semantic Alignment. [Paper]
  • [5] [WACV'22] | SAC: Semantic Attention Composition for Text-Conditioned Image Retrieval. [Paper]
  • [6] [SIGIR'22] | Progressive Learning for Image Retrieval with Hybrid-Modality Queries. [Paper]
  • [7] [CVPR'22] | Effective Conditioned and Composed Image Retrieval Combining CLIP-based Features. [Paper]
  • [8] [CVPR'22] | FashionVLP: Vision Language Transformer for Fashion Retrieval with Feedback. [Paper]
  • [9] [TMM'22] | Enhance Composed Image Retrieval via Multi-Level Collaborative Localization and Semantic Activeness Perception. [Paper]
  • [10] [TMM'22] | Adversarial and Isotropic Gradient Augmentation for Image Retrieval With Text Feedback. [Paper]
  • [11] [EMNLP'22] | FaD-VLP: Fashion Vision-and-Language Pre-training towards Unified Retrieval and Captioning. [Paper]
  • [12] [ECCV'22] | FashionViL: Fashion-Focused Vision-and-Language Representation Learning. [Paper]

2021

  • [1] [ICCV'21] | Image Retrieval on Real-life Images with Pre-trained Vision-and-Language Models. [Paper][Arxiv]
  • [2] [CVPR'21] | CoSMo: Content-Style Modulation for Image Retrieval with Text Feedback. [Paper]
  • [3] [WACV'21] | Compositional Learning of Image-Text Query for Image Retrieval. [Paper]
  • [4] [MM'21] | Heterogeneous Feature Fusion and Cross-modal Alignment for Composed Image Retrieval. [Paper]
  • [5] [MM'21] | Cross-modal Joint Prediction and Alignment for Composed Query Image Retrieval. [Paper]
  • [6] [MM'21] | Image Retrieval with Text Feedback by Deep Hierarchical Attention Mutual Information Maximization. [Paper]
  • [7] [AAAI'21] | Dual Compositional Learning in Interactive Image Retrieval. [Paper]
  • [8] [SIGIR'21] | Comprehensive Linguistic-Visual Composition Network for Image Retrieval. [Paper]

2020

  • [1] [CVPR'20] | Image Search With Text Feedback by Visiolinguistic Attention Learning. [Paper]
  • [2] [MM'20] | Joint Attribute Manipulation and Modality Alignment Learning for Composing Text and Image to Image Retrieval. [Paper]
  • [3] [ECCV'20] | Learning Joint Visual Semantic Matching Embeddings for Language-Guided Retrieval. [Paper]
  • [4] [CVPR'20] | Composed Query Image Retrieval Using Locally Bounded Features. [Paper]

2019

  • [1] [CVPR'19] | Composing Text and Image for Image Retrieval - an Empirical Odyssey. [Paper][Arxiv]

3. Few-Shot CIR

Pre-prints

  • [1] [Arxiv'24] | Pseudo Triplet Guided Few-shot Composed Image Retrieval. [Paper]

2023

  • [1] [AAAI'23] | Few-Shot Composition Learning for Image Retrieval with Prompt Tuning. [Paper]

4. Zero-Shot CIR

Pre-prints

  • [1] [Arxiv'24] | Spherical Linear Interpolation and Text-Anchoring for Zero-shot Composed Image Retrieval. [Paper]
  • [2] [Arxiv'24] | iSEARLE: Improving Textual Inversion for Zero-Shot Composed Image Retrieval. [Paper]
  • [3] [Arxiv'24] | Reducing Task Discrepancy of Text Encoders for Zero-Shot Composed Image Retrieval. [Paper]
  • [4] [Arxiv'24] | Zero-shot Composed Image Retrieval Considering Query-target Relationship Leveraging Masked Image-text Pairs. [Paper]
  • [5] [Arxiv'24] | Training-free Zero-shot Composed Image Retrieval with Local Concept Re-ranking. [Paper]
  • [6] [Arxiv'24] | HyCIR: Boosting Zero-Shot Composed Image Retrieval with Synthetic Labels. [Paper]
  • [7] [Arxiv'24] | Training-free Zero-shot Composed Image Retrieval via Weighted Modality Fusion and Similarity. [Paper]
  • [8] [Arxiv'23] | Pretrain like Your Inference: Masked Tuning Improves Zero-Shot Composed Image Retrieval. [Paper]

2024

  • [1] [AAAI'24] | Context-I2W: Mapping Images to Context-Dependent Words for Accurate Zero-Shot Composed Image Retrieval. [Paper]
  • [2] [ICLR'24] | Vision-by-Language for Training-Free Compositional Image Retrieval. [Paper]
  • [3] [CVPR'24] | LinCIR: Language-only Training of Zero-shot Composed Image Retrieval. [Paper]
  • [4] [CVPR'24] | Knowledge-Enhanced Dual-stream Zero-shot Composed Image Retrieval. [Paper]
  • [5] [SIGIR'24] | Fine-grained Textual Inversion Network for Zero-Shot Composed Image Retrieval. [Paper]
  • [6] [SIGIR'24] | LDRE: LLM-based Divergent Reasoning and Ensemble for Zero-Shot Composed Image Retrieval. [Paper]
  • [7] [ICML'24] | Improve Context Understanding in Multimodal Large Language Models via Multimodal Composition Learning. [Paper]
  • [8] [ICML'24] | MagicLens: Self-Supervised Image Retrieval with Open-Ended Instructions. [Paper]
  • [9] [TMLR'24] | CompoDiff: Versatile Composed Image Retrieval With Latent Diffusion. [Paper]

2023

  • [1] [ICCV'23] | Zero-shot Composed Image Retrieval with Textual Inversion. [Paper]
  • [2] [CVPR'23] | Pic2Word: Mapping Pictures to Words for Zero-shot Composed Image Retrieval. [Paper]
  • [3] [BMVC'23] | Zero-shot Composed Text-Image Retrieval. [Paper]

5. Semi-supervised CIR

2024

  • [1] [CVPR'24] | Visual Delta Generator with Large Multi-modal Models for Semi-supervised Composed Image Retrieval. [Paper]

6. Conversational CIR

Pre-prints

  • [1] [Arxiv'24] | Leveraging Large Language Models for Multimodal Search. [Paper]

2023

  • [1] [ICCV'23] | FashionNTM: Multi-turn Fashion Image Retrieval via Cascaded Memory. [Paper]
  • [2] [MM'23] | Conversational Composed Retrieval with Iterative Sequence Refinement. [Paper]

2021

  • [1] [SIGIR'21] | Conversational Fashion Image Retrieval via Multiturn Natural Language Feedback. [Paper]

2018

  • [1] [NeruIPS'18] | Dialog-based interactive image retrieval. [Paper]

7. COVR

Pre-prints

  • [1] [Arxiv'24] | Localizing Events in Videos with Multimodal Queries. [Paper]

2024

  • [1] [AAAI'24] | CoVR: Learning Composed Video Retrieval from Web Video Captions. [Paper]
  • [2] [CVPR'24] | Composed Video Retrieval via Enriched Context and Discriminative Embeddings. [Paper]
  • [3] [TPAMI'24] | CoVR-2: Automatic Data Construction for Composed Video Retrieval. [Paper]
  • [4] [ECCV'24] | EgoCVR: An Egocentric Benchmark for Fine-Grained Composed Video Retrieval. [Paper]

8. Sketch-based CIR

2024

  • [1] [AAAI'24] | Composite Sketch+Text Queries for Retrieving Objects with Elusive Names and Complex Interactions. [Paper]
  • [2] [CVPR'24] | You'll Never Walk Alone: A Sketch and Text Duet for Fine-Grained Image Retrieval. [Paper]

9. Others

Person Retrieval

  • [1] [Arxiv'24] | Word4Per: Zero-shot Composed Person Retrieval. [Paper]

Remote Sensing Retrieval

  • [1] [IGARSS'24] | Composed Image Retrieval for Remote Sensing. [Paper]
  • [2] [TGRS'24] | Scene Graph-Aware Hierarchical Fusion Network for Remote Sensing Image Retrieval With Text Feedback. [Paper]

Survey

  • [1] [Arxiv'24] | A Survey of Multimodal Composite Editing and Retrieval. [Paper]

Dataset

  • [1] [Arxiv'24] | EUFCC-CIR: a Composed Image Retrieval Dataset for GLAM Collections. [Paper]