Cross-modal Retrieval

1. Introduction
2. Supported Methods
3. Usage

1. Introduction

This library is an open-source repository that contains cross-modal retrieval methods and codes.

2. Supported Methods

The currently supported algorithms include:

[Click to expand]

2.1 Unsupervised cross-modal hashing retrieval

[Click to expand]

2.1.1 Unsupervised shallow cross-modal hashing retrieval

[Click to expand]

2.1.1.1 Matrix Factorization

[Click to expand]

2017

RFDH：Robust and Flexible Discrete Hashing for Cross-Modal Similarity Search(TCSVT) [PDF] [Code]

2015

STMH:Semantic Topic Multimodal Hashing for Cross-Media Retrieval(IJCAI)[PDF]

2014

LSSH:Latent Semantic Sparse Hashing for Cross-Modal Similarity Search(SIGIR)[PDF]
CMFH:Collective Matrix Factorization Hashing for Multimodal Data(CVPR)[PDF]

2.1.1.2 Graph Theory

[Click to expand]

2018

HMR:Hetero-Manifold Regularisation for Cross-Modal Hashing(TPAMI)[PDF]

2017

FSH:Cross-Modality Binary Code Learning via Fusion Similarity Hashing(CVPR)[PDF][Code]

2014

SM2H:Sparse Multi-Modal Hashing(TMM)[PDF]

2013

IMH:Inter-Media Hashing for Large-scale Retrieval from Heterogeneous Data Sources(SIGMOD)[PDF]
LCMH:Linear Cross-Modal Hashing for Efﬁcient Multimedia Search(MM)[PDF]

2011

CVH:Learning Hash Functions for Cross-View Similarity Search(IJCAI)[PDF]

2.1.1.3 Other Shallow

[Click to expand]

2019

CRE:Collective Reconstructive Embeddings for Cross-Modal Hashing(TIP)[PDF]

2018

HMR:Hetero-Manifold Regularisation for Cross-Modal Hashing(TPAMI)[PDF]

2015

FS-LTE:Full-Space Local Topology Extraction for Cross-Modal Retrieval(TIP)[PDF]

2014

IMVH:Iterative Multi-View Hashing for Cross Media Indexing(MM)[PDF]

2013

PDH:Predictable Dual-View Hashing(ICML)[PDF]

2.1.1.4 Quantization

[Click to expand]

2016

CCQ:Composite Correlation Quantization for Efﬁcient Multimodal Retrieval(SIGIR)[PDF]
CMCQ:Collaborative Quantization for Cross-Modal Similarity Search(CVPR)[PDF]

2015

ACQ:Alternating Co-Quantization for Cross-modal Hashing(ICCV)[PDF]

2.1.2 Unsupervised deep cross-modal hashing retrieval

[Click to expand]

2.1.2.1 Naive Network

[Click to expand]

2019

UDFCH:Unsupervised Deep Fusion Cross-modal Hashing(ICMI)[PDF]

2018

UDCMH:Unsupervised Deep Hashing via Binary Latent Factor Models for Large-scale Cross-modal Retrieval(IJCAI)[PDF]

2017

DBRC:Deep Binary Reconstruction for Cross-modal Hashing(MM)[PDF]

2015

DMHOR:Learning Compact Hash Codes for Multimodal Representations Using Orthogonal Deep Structure(TMM)[PDF]

2.1.2.2 GAN

[Click to expand]

2020

MGAH:Multi-Pathway Generative Adversarial Hashing for Unsupervised Cross-Modal Retrieval(TMM)[PDF]

2019

CYC-DGH:Cycle-Consistent Deep Generative Hashing for Cross-Modal Retrieval(TIP)[PDF]
UCH:Coupled CycleGAN: Unsupervised Hashing Network for Cross-Modal Retrieval(AAAI)[PDF]

2018

UGACH:Unsupervised Generative Adversarial Cross-modal Hashing(AAAI)[PDF][Code]

2.1.2.3 Graph Model

[Click to expand]

2022

ASSPH:Adaptive Structural Similarity Preserving for Unsupervised Cross Modal Hashing(MM)[PDF]

2021

AGCH:Aggregation-based Graph Convolutional Hashing for Unsupervised Cross-modal Retrieval(TMM)[PDF]
DGCPN:Deep Graph-neighbor Coherence Preserving Network for Unsupervised Cross-modal Hashing(AAAI)[PDF][Code]

2020

DCSH:Unsupervised Deep Cross-modality Spectral Hashing(TIP)[PDF]
SRCH:Set and Rebase: Determining the Semantic Graph Connectivity for Unsupervised Cross-Modal Hashing(IJCAI)[PDF]
JDSH:Joint-modal Distribution-based Similarity Hashing for Large-scale Unsupervised Deep Cross-modal Retrieval(SIGIR)[PDF][Code]
DSAH:Deep Semantic-Alignment Hashing for Unsupervised Cross-Modal Retrieval(ICMR)[PDF][Code]

2019

DJSRH:Deep Joint-Semantics Reconstructing Hashing for Large-Scale Unsupervised Cross-Modal Retrieval(ICCV)[PDF][Code]

2.1.2.4 Knowledge Distillation

[Click to expand]

2022

DAEH:Deep Adaptively-Enhanced Hashing With Discriminative Similarity Guidance for Unsupervised Cross-Modal Retrieval(TCSVT)[PDF]

2021

KDCMH:Unsupervised Deep Cross-Modal Hashing by Knowledge Distillation for Large-scale Cross-modal Retrieval(ICMR)[PDF]
JOG:Joint-teaching: Learning to Refine Knowledge for Resource-constrained Unsupervised Cross-modal Retrieval(MM)[PDF]

2020

UKD:Creating Something from Nothing: Unsupervised Knowledge Distillation for Cross-Modal Hashing(CVPR)[PDF]

2.2 Supervised-cross-modal-hashing-retrieval

[Click to expand]

2.2.1 Supervised shallow cross-modal hashing retrieval

[Click to expand]

2.2.1.1 Matrix Factorization

[Click to expand]

2022

SCLCH: Joint Specifics and Consistency Hash Learning for Large-Scale Cross-Modal Retrieval(TIP) [PDF]

2020

BATCH: A Scalable Asymmetric Discrete Cross-Modal Hashing(TKDE) [PDF] [Code]

2019

LCMFH: Label Consistent Matrix Factorization Hashing for Large-Scale Cross-Modal Similarity Search(TPAMI) [PDF]
TECH: A Two-Step Cross-Modal Hashing by Exploiting Label Correlations and Preserving Similarity in Both Steps(MM) [PDF]

2018

SCRATCH: A Scalable Discrete Matrix Factorization Hashing for Cross-Modal Retrieval(MM) [PDF]

2017

DCH: Learning Discriminative Binary Codes for Large-scale Cross-modal Retrieval(TIP) [PDF]

2016

SMFH: Supervised Matrix Factorization for Cross-Modality Hashing(IJCAI) [PDF]
SMFH: Supervised Matrix Factorization Hashing for Cross-Modal Retrieval(TIP) [PDF]

2.2.1.2 Dictionary Learning

[Click to expand]

2016

DCDH: Discriminative Coupled Dictionary Hashing for Fast Cross-Media Retrieval(MM) [PDF]

2014

DLCMH: Dictionary Learning Based Hashing for Cross-Modal Retrieval(SIGIR) [PDF]

2.2.1.3 Feature Mapping-Sample-Constraint-Label-Constraint

[Click to expand]

2022

DJSAH: Discrete Joint Semantic Alignment Hashing for Cross-Modal Image-Text Search(TCSVT) (PDF)

2020

FUH: Fast Unmediated Hashing for Cross-Modal Retrieval(TCSVT) (PDF)

2016

MDBE: Multimodal Discriminative Binary Embedding for Large-Scale Cross-Modal Retrieval(TIP) (PDF) [Code]

2.2.1.4 Feature Mapping-Sample-Constraint-Separate-Hamming

[Click to expand]

2017

CSDH: Sequential Discrete Hashing for Scalable Cross-Modality Similarity Retrieval(TIP) (PDF)

2016

DASH: Frustratingly Easy Cross-Modal Hashing(MM) (PDF)

2015

QCH: Quantized Correlation Hashing for Fast Cross-Modal Search(IJCAI) (PDF)

2.2.1.5 Feature Mapping-Sample-Constraint-Common Hamming

[Click to expand]

2021

ASCSH: Asymmetric Supervised Consistent and Speciﬁc Hashing for Cross-Modal Retrieval(TIP) (PDF) [Code]

2019

SRDMH: Supervised Robust Discrete Multimodal Hashing for Cross-Media Retrieval(TMM) (PDF)

2018

FDCH: Fast Discrete Cross-modal Hashing With Regressing From Semantic Labels(MM) (PDF)

2017

SRSH: Semi-Relaxation Supervised Hashing for Cross-Modal Retrieval(MM) (PDF) [Code]
RoPH: Cross-Modal Hashing via Rank-Order Preserving(TMM) (PDF) [Code]

2016

SRDMH: Supervised Robust Discrete Multimodal Hashing for Cross-Media Retrieval(CIKM) (PDF)

2.2.1.6 Feature Mapping-Relation-Constraint

[Click to expand]

2017

LSRH: Linear Subspace Ranking Hashing for Cross-Modal Retrieval(TPAMI) (PDF)

2014

SCM: Large-Scale Supervised Multimodal Hashing with Semantic Correlation Maximization(AAAI) (PDF)
HTH: Scalable Heterogeneous Translated Hashing(KDD) (PDF)

2013

PLMH: Parametric Local Multimodal Hashing for Cross-View Similarity Search(IJCAI) (PDF)
RaHH: Comparing Apples to Oranges: A Scalable Solution with Heterogeneous Hashing(KDD) (PDF) [Code]

2012

CRH: Co-Regularized Hashing for Multimodal Data(CRH) (PDF)

2.2.1.7 Other Shallow

[Click to expand]

2019

DLFH: Discrete Latent Factor Model for Cross-Modal Hashing(TIP) (PDF) [Code]

2018

SDMCH: Supervised Discrete Manifold-Embedded Cross-Modal Hashing(IJCAI) (PDF)

2015

SePH: Semantics-Preserving Hashing for Cross-View Retrieval(CVPR) (PDF)

2012

MLBE: A Probabilistic Model for Multimodal Hash Function Learning(KDD) (PDF)

2010

CMSSH: Data Fusion through Cross-modality Metric Learning using Similarity-Sensitive Hashing(CVPR) (PDF)

2.2.2 Supervised deep cross-modal hashing retrieval

[Click to expand]

2.2.2.1 Naive Network-Distance-Constraint

[Click to expand]

2019

MCITR: Cross-modal Image-Text Retrieval with Multitask Learning(CIKM) (PDF)

2016

CAH: Correlation Autoencoder Hashing for Supervised Cross-Modal Search(ICMR) (PDF)

2014

CMNNH: Cross-Media Hashing with Neural Networks(MM) (PDF)
MMNN: Multimodal Similarity-Preserving Hashing(TPAMI) (PDF)

2.2.2.2 Naive Network-Similarity-Constraint

[Click to expand]

2022

Bi-CMR: Bidirectional Reinforcement Guided Hashing for Effective Cross-Modal Retrieval(AAAI) (PDF) [Code]
Bi-NCMH: Deep Normalized Cross-Modal Hashing with Bi-Direction Relation Reasoning(CVPR) (PDF)

2021

OTCMR: Bridging Heterogeneity Gap with Optimal Transport for Cross-modal Retrieval(CIKM) (PDF)
DUCMH: Deep Uniﬁed Cross-Modality Hashing by Pairwise Data Alignment(IJCAI) (PDF)

2020

NRDH: Nonlinear Robust Discrete Hashing for Cross-Modal Retrieval(SIGIR) (PDF)
DCHUC: Deep Cross-Modal Hashing with Hashing Functions and Uniﬁed Hash Codes Jointly Learning(TKDE) (PDF) [Code]

2017

CHN: Correlation Hashing Network for Efﬁcient Cross-Modal Retrieval(BMVC) (PDF)

2016

DVSH: Deep Visual-Semantic Hashing for Cross-Modal Retrieval(KDD) (PDF)

2.2.2.3 Naive Network-Negative-Log-Likelihood

[Click to expand]

2022

MSSPQ: Multiple Semantic Structure-Preserving Quantization for Cross-Modal Retrieval(ICMR) (PDF)

2021

DMFH: Deep Multiscale Fusion Hashing for Cross-Modal Retrieval(TCSVT) (PDF)
TEACH: Attention-Aware Deep Cross-Modal Hashing(ICMR) (PDF)

2020

MDCH: Mask Cross-modal Hashing Networks(TMM) (PDF)

2019

EGDH: Equally-Guided Discriminative Hashing for Cross-modal Retrieval(IJCAI) (PDF)

2018

DDCMH: Dual Deep Neural Networks Cross-Modal Hashing(AAAI) (PDF)
CMHH: Cross-Modal Hamming Hashing(ECCV) (PDF)

2017

PRDH: Pairwise Relationship Guided Deep Hashing for Cross-Modal Retrieval(AAAI) (PDF)
DCMH: Deep Cross-Modal Hashing(CVPR) (PDF) [Code]

2.2.2.4 Naive Network-Triplet-Constraint

[Click to expand]

2019

RDCMH: Multiple Semantic Structure-Preserving Quantization for Cross-Modal Retrieval(AAAI) (PDF)

2018

MCSCH: Multi-Scale Correlation for Sequential Cross-modal Hashing Learning(MM) (PDF)
TDH: Triplet-Based Deep Hashing Network for Cross-Modal Retrieval(TIP) (PDF)

2.2.2.5 GAN

[Click to expand]

2022

SCAHN: Semantic Structure Enhanced Contrastive Adversarial Hash Network for Cross-media Representation Learning(MM) (PDF) [Code]

2021

TGCR: Multiple Semantic Structure-Preserving Quantization for Cross-Modal Retrieval(TCSVT) (PDF)

2020

CPAH: Multi-Task Consistency-Preserving Adversarial Hashing for Cross-Modal Retrieval(TIP) (PDF) [Code]
MLCAH: Multi-Level Correlation Adversarial Hashing for Cross-Modal Retrieval(TMM) (PDF)
DADH: Deep Adversarial Discrete Hashing for Cross-Modal Retrieval(ICMR) (PDF) [Code]

2019

AGAH: Adversary Guided Asymmetric Hashing for Cross-Modal Retrieval(ICMR) (PDF) [Code]

2018

SSAH: Self-Supervised Adversarial Hashing Networks for Cross-Modal Retrieval(CVPR) (PDF) [Code]

2.2.2.6 Graph Model

[Click to expand]

2022

HMAH: Multi-Task Consistency-Preserving Adversarial Hashing for Cross-Modal Retrieval(TMM) (PDF)
SCAHN: Semantic Structure Enhanced Contrastive Adversarial Hash Network for Cross-media Representation Learning(MM) (PDF) [Code]

2021

LGCNH: Local Graph Convolutional Networks for Cross-Modal Hashing(MM) (PDF) [Code]

2019

GCH: Graph Convolutional Network Hashing for Cross-Modal Retrieval(IJCAI) (PDF) [Code]

2.2.2.7 Transformer

[Click to expand]

2022

DCHMT: Differentiable Cross-modal Hashing via Multimodal Transformers(CIKM) (PDF) [Code]
UniHash: Contrastive Label Correlation Enhanced Unified Hashing Encoder for Cross-modal Retrieval(MM) (PDF) [Code]

2.2.2.8 Memory Network

[Click to expand]

2021

CMPD: Using Cross Memory Network With Pair Discrimination for Image-Text Retrieval(TCSVT) (PDF)

2019

CMMN: Deep Memory Network for Cross-Modal Retrieval(TMM) (PDF)

2.2.2.9 Quantization

[Click to expand]

2022

ACQH: Asymmetric Correlation Quantization Hashing for Cross-Modal Retrieval(TMM) (PDF)

2017

CDQ: Collective Deep Quantization for Efﬁcient Cross-Modal Retrieval(AAAI) (PDF) [Code]

2.3 Unsupervised-cross-modal-real-valued

[Click to expand]

2.3.1 Early unsupervised cross-modal real-valued retrieval

[Click to expand]

2.3.1.1 CCA

[Click to expand]

2017

ICCA:Towards Improving Canonical Correlation Analysis for Cross-modal Retrieval(MM) [PDF]

2015

DCMIT:Deep Correlation for Matching Images and Text(CVPR) [PDF]
RCCA:Learning Query and Image Similarities with Ranking Canonical Correlation Analysis(ICCV) [PDF]

2014

MCCA:A Multi-View Embedding Space for Modeling Internet Images, Tags, and Their Semantics(IJCV) [PDF]

2013

KCCA:Framing Image Description as a Ranking Task Data, Models and Evaluation Metrics(JAIR) [PDF]
DCCA:Deep Canonical Correlation Analysis(ICML) [PDF] [Code]

2012

CR:Continuum Regression for Cross-modal Multimedia Retrieval(ICIP) [PDF]

2010

CCA:A New Approach to Cross-Modal Multimedia Retrieval(MM) [PDF][Code]

2.3.1.2 Topic Model

[Click to expand]

2011

MDRF:Learning Cross-modality Similarity for Multinomial Data(ICCV) [PDF]

2010

tr-mmLDA:Topic Regression Multi-Modal Latent Dirichlet Allocation for Image Annotation(CVPR) [PDF]

2003

Corr-LDA:Modeling Annotated Data(SIGIR) [PDF]

2.3.1.3 Other Shallow

[Click to expand]

2013

Bi-CMSRM:Cross-Media Semantic Representation via Bi-directional Learning to Rank(MM) [PDF]
CTM:Cross-media Topic Mining on Wikipedia(MM) [PDF]

2012

CoCA:Dimensionality Reduction on Heterogeneous Feature Space(ICDM) [PDF]

2011

MCU:Maximum Covariance Unfolding: Manifold Learning for Bimodal Data(NIPS) [PDF]

2008

PAMIR:A Discriminative Kernel-Based Model to Rank Images from Text Queries(TPAMI) [PDF]

2003

CFA:Multimedia Content Processing through Cross-Modal Association(MM) [PDF]

2.3.1.4 Neural Network

[Click to expand]

2018

CDPAE:Comprehensive Distance-Preserving Autoencoders for Cross-Modal Retrieval(MM) [PDF][Code]

2016

CMDN:Cross-Media Shared Representation by Hierarchical Learning with Multiple Deep Networks(IJCAI) [PDF][Code]
MSAE:Effective deep learning-based multi-modal retrieval(VLDB) [PDF]

2014

Corr-AE:Cross-modal Retrieval with Correspondence Autoencoder(MM) [PDF]

2013

RGDBN:Latent Feature Learning in Social Media Network(MM) [PDF]

2012

MDBM:Multimodal Learning with Deep Boltzmann Machines(NIPS) [PDF]

2.3.2 Image-text matching retrieval

[Click to expand]

2.3.2.1 Native Network

[Click to expand]

2022

UWML:Universal Weighting Metric Learning for Cross-Modal Retrieval (TPAMI) [PDF][Code]
LESS:Learning to Embed Semantic Similarity for Joint Image-Text Retrieval (TPAMI)[PDF]
CMCM:Cross-Modal Coherence for Text-to-Image Retrieval (AAAI) [PDF]
P2RM:Point to Rectangle Matching for Image Text Retrieval(MM) [PDF]

2020

DPCITE:Dual-path Convolutional Image-Text Embeddings with Instance Loss(TOMM) [PDF] [code]
PSN:Preserving Semantic Neighborhoods for Robust Cross-Modal Retrieval(ECCV) [PDF] [Code]

2019

LDR:Learning Disentangled Representation for Cross-Modal Retrieval with Deep Mutual Information Estimation(MM) [PDF]

2018

CHAIN-VSE:Bidirectional Retrieval Made Simple(CVPR) [PDF] [Code]

2017

CRC:Cross-media Relevance Computation for Multimedia Retrieval(MM) [PDF]
VSE++: Improving Visual-Semantic Embeddings with Hard Negatives:(Arxiv) [PDF][Code]
RRF-Net:Learning a Recurrent Residual Fusion Network for Multimodal Matching(ICCV) [PDF][Code]

2016

DBRLM:Cross-Modal Retrieval via Deep and Bidirectional Representation Learning(TMM) [PDF]

2015

MSDS:Image-Text Cross-Modal Retrieval via Modality-Speciﬁc Feature Learning(ICMR) [PDF]

2014

DT-RNN:Grounded Compositional Semantics for Finding and Describing Images with Sentences(TACL) [PDF]

2.3.2.2 Dot-product Attention

[Click to expand]

2020

SMAN: Stacked Multimodal Attention Network for Cross-Modal Image-Text Retrieval(TC) [PDF]
CAAN:Context-Aware Attention Network for Image-Text Retrieval(CVPR) [PDF]
IMRAM: Iterative Matching with Recurrent Attention Memory for Cross-Modal Image-Text Retrieval(CVPR) [PDF] [Code]

2019

PFAN:Position Focused Attention Network for Image-Text Matching (IJCAI) [PDF][Code]
CAMP: Cross-Modal Adaptive Message Passing for Text-Image Retrieval(ICCV) [PDF] [Code]
CMRSC:Cross-Modal Image-Text Retrieval with Semantic Consistency(MM) [PDF] [Code]

2018

MCSM:Modality-specific Cross-modal Similarity Measurement with Recurrent Attention Network(TIP) [PDF][Code]
DSVEL:Finding beans in burgers: Deep semantic-visual embedding with localization(CVPR) [PDF][Code]
CRAN:Cross-media Multi-level Alignment with Relation Attention Network(IJCAI)[PDF]
SCAN:Stacked Cross Attention for Image-Text Matching(ECCV) [PDF] [Code]

2017

sm-LSTM:Instance-aware Image and Sentence Matching with Selective Multimodal LSTM(CVPR) [PDF]

2.3.2.3 Graph Model

[Click to expand]

2022

LHSC:Learning Hierarchical Semantic Correspondences for Cross-Modal Image-Text Retrieval(ICMR) [PDF]
IFRFGF:Improving Fusion of Region Features and Grid Features via Two-Step Interaction for Image-Text Retrieval(MM) [PDF]
CODER:Coupled Diversity-Sensitive Momentum Contrastive Learning for Image-Text Retrieval(ECCV) [PDF]

2021

HSGMP: Heterogeneous Scene Graph Message Passing for Cross-modal Retrieval(ICMR) [PDF]
WCGL：Wasserstein Coupled Graph Learning for Cross-Modal Retrieval(ICCV)[PDF]

2020

DSRAN:Learning Dual Semantic Relations with Graph Attention for Image-Text Matching(TCSVT) [PDF] [code]
VSM:Visual-Semantic Matching by Exploring High-Order Attention and Distraction(CVPR) [PDF]
SGM:Cross-modal Scene Graph Matching for Relationship-aware Image-Text Retrieval(WACV) [PDF]

2019

KASCE:Knowledge Aware Semantic Concept Expansion for Image-Text Matching(IJCAI) [PDF]
VSRN:Visual Semantic Reasoning for Image-Text Matching(ICCV) [PDF] [Code]

2.3.2.4 Transformer

[Click to expand]

2022

DREN:Dual-Level Representation Enhancement on Characteristic and Context for Image-Text Retrieval(TCSVT) [PDF]
M2D-BERT:Multi-scale Multi-modal Dictionary BERT For Effective Text-image Retrieval in Multimedia Advertising(CIKM) [PDF]
ViSTA:ViSTA: Vision and Scene Text Aggregation for Cross-Modal Retrieval(CVPR) [PDF]
COTS: Collaborative Two-Stream Vision-Language Pre-Training Model for Cross-Modal Retrieval(CVPR) [PDF]
EI-CLIP: Entity-aware Interventional Contrastive Learning for E-commerce Cross-modal Retrieval(CVPR) [PDF]
SSAMT:Constructing Phrase-level Semantic Labels to Form Multi-Grained Supervision for Image-Text Retrieval(ICMR) [PDF]
TEAM:Token Embeddings Alignment for Cross-Modal Retrieval(MM) [PDF]
CAliC: Accurate and Efficient Image-Text Retrieval via Contrastive Alignment and Visual Contexts Modeling(MM) [PDF]

2021

GRAN:Global Relation-Aware Attention Network for Image-Text Retrieval(ICMR) [PDF]
PCME:Probabilistic Embeddings for Cross-Modal Retrieval(CVPR) [PDF] [code]

2020

FashionBERT: Text and Image Matching with Adaptive Loss for Cross-modal Retrieval(SIGIR) [PDF]

2019

PVSE:Polysemous Visual-Semantic Embedding for Cross-Modal Retrieval(CVPR) [PDF] [Code]

2.3.2.5 Cross-modal Generation

[Click to expand]

2022

PCMDA:Paired Cross-Modal Data Augmentation for Fine-Grained Image-to-Text Retrieval(MM)[PDF]

2021

CRGN:Deep Relation Embedding for Cross-Modal Retrieval(TIP) [PDF][Code]
X-MRS:Cross-Modal Retrieval and Synthesis (X-MRS): Closing the Modality Gapin Shared Representation Learning(MM) [PDF][Code]

2020

AACR:Augmented Adversarial Training for Cross-Modal Retrieval(TMM) [PDF] [Code]

2018

LSCO:Learning Semantic Concepts and Order for Image and Sentence Matching(CVPR) [PDF]
TCCM:Towards Cycle-Consistent Models for Text and Image Retrieval(CVPR) [PDF]
GXN:Look, Imagine and Match: Improving Textual-Visual Cross-Modal Retrieval with Generative Models(CVPR) [PDF]

2017

2WayNet:Linking Image and Text with 2-Way Nets(CVPR) [PDF]

2015

DVSA:Deep Visual-Semantic Alignments for Generating Image Descriptions(CVPR) [PDF]

2.4 Supervised-cross-modal-real-valued

[Click to expand]

2.4.1 Supervised shallow cross-modal real-valued retrieval

[Click to expand]

2.4.1.1 CCA

[Click to expand]

2022

MVMLCCA: Multi-view Multi-label Canonical Correlation Analysis for Cross-modal Matching and Retrieval(CVPRW) [PDF] [Code]

2015

ml-CCA: Multi-Label Cross-modal Retrieval(ICCV) [PDF] [Code]

2014

cluster-CCA: Cluster Canonical Correlation Analysis(ICAIS) [PDF]

2012

GMA: Generalized Multiview Analysis: A Discriminative Latent Space(CVPR) [PDF] [Code]

2.4.1.2 Dictionary Learning

[Click to expand]

2018

JDSLC: Joint Dictionary Learning and Semantic Constrained Latent Subspace Projection for Cross-Modal Retrieval(CIKM) [PDF]

2016

DDL: Discriminative Dictionary Learning With Common Label Alignment for Cross-Modal Retrieval(TMM) [PDF]

2014

CMSDL: Cross-Modality Submodular Dictionary Learning for Information Retrieval(CIKM) [PDF]

2013

SliM2: Supervised Coupled Dictionary Learning with Group Structures for Multi-Modal Retrieval(AAAI) [PDF]

2.4.1.3 Feature Mapping

[Click to expand]

2017

MDSSL: Cross-Modal Retrieval Using Multiordered Discriminative Structured Subspace Learning(TMM) [PDF]
JLSLR: Joint Latent Subspace Learning and Regression for Cross-Modal Retrieval(SIGIR) [PDF]

2016

JFSSL: Joint Feature Selection and Subspace Learning for Cross-Modal Retrieval(TPAIMI) [PDF] [Code]
MDCR: Modality-Dependent Cross-Media Retrieval(TIST) [PDF]
CRLC: Cross-modal Retrieval with Label Completion(MM) [PDF]

2013

JGRHML: Heterogeneous Metric Learning with Joint Graph Regularization for Cross-Media Retrieval(AAAI) [PDF] [Code]
LCFS: Learning Coupled Feature Spaces for Cross-modal Matching(ICCV) [PDF]

2011

Multi-NPP: Learning Multi-View Neighborhood Preserving Projections(ICML) [PDF]

2.4.1.4 Topic Model

[Click to expand]

2014

M3R: Multi-modal Mutual Topic Reinforce Modeling for Cross-media Retrieval(MM) [PDF]
NPBUS: Nonparametric Bayesian Upstream Supervised Multi-Modal Topic Models(WSDM) [PDF]

2.4.1.5 Other Shallow

[Click to expand]

2019

CMOS: Online Asymmetric Metric Learning With Multi-Layer Similarity Aggregation for Cross-Modal Retrieval(TIP) [PDF]

2017

CMOS: Online Asymmetric Similarity Learning for Cross-Modal Retrieval(CVPR) [PDF]

2016

PL-ranking: A Novel Ranking Method for Cross-Modal Retrieval(MM) [PDF]
RL-PLS: Cross-modal Retrieval by Real Label Partial Least Squares(MM) [PDF]

2013

PFAR: Parallel Field Alignment for Cross Media Retrieval(MM) [PDF]

2.4.2 Supervised deep cross-modal real-valued retrieval

[Click to expand]

2.4.2.1 Naive Network

[Click to expand]

2022

C3CMR: Cross-Modality Cross-Instance Contrastive Learning for Cross-Media Retrieval(MM) [PDF]

2020

ED-Net: Event-Driven Network for Cross-Modal Retrieval(CIKM) [PDF]

2019

DSCMR: Deep Supervised Cross-modal Retrieval(CVPR) [PDF] [Code]
SAM: Cross-Modal Subspace Learning with Scheduled Adaptive Margin Constraints(MM) [PDF]

2017

deep-SM: Cross-Modal Retrieval With CNN Visual Features: A New Baseline(TCYB) [PDF] [Code]
CCL: Cross-modal Correlation Learning With Multigrained Fusion by Hierarchical Network(TMM) [PDF]
MSFN: Cross-media Retrieval by Learning Rich Semantic Embeddings of Multimedia(MM) [PDF]
MNiL: Multi-Networks Joint Learning for Large-Scale Cross-Modal Retrieval(MM) [PDF] [Code]

2016

MDNN: Effective deep learning-based multi-modal retrieval(VLDB) [PDF]

2015

RE-DNN: Deep Semantic Mapping for Cross-Modal Retrieval(ICTAI) [PDF]
C2MLR: Deep Compositional Cross-modal Learning to Rank via Local-Global Alignment(MM) [PDF]

2.4.2.2 GAN

[Click to expand]

2022

JFSE: Joint Feature Synthesis and Embedding: Adversarial Cross-Modal Retrieval Revisited(TPAMI) [PDF] [Code]

2021

AACR: Augmented Adversarial Training for Cross-Modal Retrieval(TMM) [PDF] [Code]

2018

CM-GANs: Cross-modal Generative Adversarial Networks for Common Representation Learning(TMM) [PDF] [Code]

2017

ACMR: Adversarial Cross-Modal Retrieval(MM) [PDF] [Code]

2.4.2.3 Graph Model

[Click to expand]

2022

AGCN: Adversarial Graph Convolutional Network for Cross-Modal Retrieval(TCSVT) [PDF]
ALGCN: Adaptive Label-Aware Graph Convolutional Networks for Cross-Modal Retrieval(TMM) [PDF]
HGE: Cross-Modal Retrieval with Heterogeneous Graph Embedding(MM) [PDF]

2021

GCR: Exploring Graph-Structured Semantics for Cross-Modal Retrieval(MM) [PDF] [Code]
DAGNN: Dual Adversarial Graph Neural Networks for Multi-label Cross-modal Retrieval(AAAI) [PDF]

2018

SSPE: Learning Semantic Structure-preserved Embeddings for Cross-modal Retrieval(MM) [PDF]

2.4.2.4 Transformer

[Click to expand]

2021

RLCMR: Rethinking Label-Wise Cross-Modal Retrieval from A Semantic Sharing Perspective(IJCAI) [PDF]

2.5 Cross-modal-Retrieval-under-Special-Retrieval-Scenario

[Click to expand]

2.5.1 Semi-Supervised (Real-valued)

[Click to expand]

2020

SSCMR:Semi-Supervised Cross-Modal Retrieval With Label Prediction(TMM) [PDF]

2019

A3VSE:Annotation Efficient Cross-Modal Retrieval with Adversarial Attentive Alignment(MM) [PDF]
ASFS:Adaptive Semi-Supervised Feature Selection for Cross-Modal Retrieval(TMM) [PDF]

2018

GSS-SL:Generalized Semi-supervised and Structured Subspace Learning for Cross-Modal Retrieval(TMM) [PDF]

2017

SSDC:Semi-supervised Distance Consistent Cross-modal Retrieval(VSCC)[PDF]

2013

JRL:Learning Cross-Media Joint Representation With Sparse and Semisupervised Regularization(TCSVT) [PDF][Code]

2012

MVML-GL:Multiview Metric Learning with Global Consistency and Local Smoothness(TIST) [PDF]

2.5.2 Semi-Supervised (Hashing)

[Click to expand]

2020

SCH-GAN：Semi-Supervised Cross-Modal Hashing by Generative Adversarial Network(TC) [PDF] [Code]
SGCH:Semi-supervised graph convolutional hashing network for large-scale cross-modal retrieval(ICIP) [PDF]

2019

SSDQ:Semi-supervised Deep Quantization for Cross-modal Search(MM) [PDF]
S3PH:Semi-supervised semantic-preserving hashing for efficient cross-modal retrieval(ICME) [PDF]

2017

AUSL:Adaptively Uniﬁed Semi-supervised Learning for Cross-Modal Retrieval(IJCAI) [PDF]

2016

NPH:Neighborhood-Preserving Hashing for Large-Scale Cross-Modal Search(MM) [PDF]

2.5.3 Imbalance (Real-valued)

[Click to expand]

2021

PAN: Prototype-based Adaptive Network for Robust Cross-modal Retrieval(SIGIR) [PDF]
MCCN: Multimodal Coordinated Clustering Network for Large-Scale Cross-modal Retrieval(MM) [PDF]

2020

DAVAE:Incomplete Cross-modal Retrieval with Dual-Aligned Variational Autoencoders(MM) [PDF]

2015

SCDL:Semi-supervised Coupled Dictionary Learning for Cross-modal Retrieval in Internet Images and Texts(MM) [PDF]
LGCFL:Learning Consistent Feature Representation for Cross-Modal Multimedia Retrieval(TMM) [PDF]

2.5.4 Imbalance (Hashing)

[Click to expand]

2020

RUCMH:Robust Unsupervised Cross-modal Hashing for Multimedia Retrieval(TOIS) [PDF]
ATFH-N:Adversarial Tri-Fusion Hashing Network for Imbalanced Cross-Modal Retrieval(TETCI) [PDF]
FlexCMH:Flexible Cross-Modal Hashing(TNNLS) [PDF]

2019

TFNH:Triplet Fusion Network Hashing for Unpaired Cross-Modal Retrieval(ICMR) [PDF] [Code]
CALM:Collective Afﬁnity Learning for Partial Cross-Modal Hashing(TIP) [PDF]
MTFH: A Matrix Tri-Factorization Hashing Framework for Efﬁcient Cross-Modal Retrieval:(TIP) [PDF] [Code]
GSPH:Generalized Semantic Preserving Hashing for Cross-Modal Retrieval(TIP) [PDF]

2018

DAH:Dense Auto-Encoder Hashing for Robust Cross-Modality Retrieval(MM) [PDF]

2017

GSPH:Generalized Semantic Preserving Hashing for n-Label Cross-Modal Retrieval(CVPR) [PDF] [Code]

2.5.5 Incremental

[Click to expand]

2021

MARS: Learning Modality-Agnostic Representation for Scalable Cross-Media Retrieval(TCSVT) [PDF]
CCMR:Continual learning in cross-modal retrieval(CVPR) [PDF]
SCML:Real-world Cross-modal Retrieval via Sequential Learning(TMM) [PDF]

2020

ATTL-CEL:Adaptive Temporal Triplet-loss for Cross-modal Embedding Learning(MM)[PDF]

2019

SVHNs:Separated Variational Hashing Networks for Cross-Modal Retrieval(MM) [PDF]
ECMH:Extensible Cross-Modal Hashing(IJCAI) [PDF] [Code]

2018

TempXNet:Temporal Cross-Media Retrieval with Soft-Smoothing(MM) [PDF]

2.5.6 Noise

[Click to expand]

2022

DECL: Deep Evidential Learning with Noisy Correspondence for Cross-modal Retrieval(MM) (PDF) [Code]
ELRCMR: Early-Learning regularized Contrastive Learning for Cross-Modal Retrieval with Noisy Labels(MM) (PDF)
CMMQ: Mutual Quantization for Cross-Modal Search with Noisy Labels(CVPR) (PDF)

2021

MRL: Learning Cross-Modal Retrieval with Noisy Labels(CVPR) (PDF) [Code]

2018

WSJE: Webly Supervised Joint Embedding for Cross-Modal Image-Text Retrieval(MM) (PDF)

2.5.7 Cross-Domain

[Click to expand]

2021

M2GUDA: Multi-Metrics Graph-Based Unsupervised Domain Adaptation for Cross-Modal Hashing(ICMR) (PDF)
ACP: Adaptive Cross-Modal Prototypes for Cross-Domain Visual-Language Retrieval(CVPR) (PDF)

2020

DASG: Unsupervised Cross-Media Retrieval Using Domain Adaptation With Scene Graph(TCSVT) (PDF)

2.5.8 Zero-Shot

[Click to expand]

2020

LCALE: Learning Cross-Aligned Latent Embeddings for Zero-Shot Cross-Modal Retrieval(AAAI) (PDF)
CFSA: Correlated Features Synthesis and Alignment for Zero-shot Cross-modal Retrieval(SIGIR) (PDF)

2019

ZS-CMR: Learning Cross-Aligned Latent Embeddings for Zero-Shot Cross-Modal Retrieval(TIP) (PDF)

2.5.9 Few-Shot

[Click to expand]

2021

SOCMH: Know Yourself and Know Others: Efficient Common Representation Learning for Few-shot Cross-modal Retrieval(ICMR) (PDF)

2.5.10 Online Learning

[Click to expand]

2020

CMOLRS: Online Fast Adaptive Low-Rank Similarity Learning for Cross-Modal Retrieval(TMM) (PDF) [Code]
LEMON: Label Embedding Online Hashing for Cross-Modal Retrieval(MM) (PDF) [Code]

2019

FOMH: Flexible Online Multi-modal Hashing for Large-scale Multimedia Retrieval(MM) (PDF) [Code]

2017

OCMSR: Online Cross-Modal Scene Retrieval by Binary Representation and Semantic Graph(MM) (PDF)

2016

OCMH: Online cross-modal hashing for web image retrieval(AAAI) (PDF)

2.5.11 Hierarchical

[Click to expand]

2020

SHDCH: Supervised Hierarchical Deep Hashing for Cross-Modal Retrieval(MM) (PDF) [Code]

2019

HiCHNet: Supervised Hierarchical Cross-Modal Hashing(SIGIR) (PDF) [Code]

2.5.12 Fine-grained

[Click to expand]

2022

PCMDA: Paired Cross-Modal Data Augmentation for Fine-Grained Image-to-Text Retrieval(MM) (PDF)

2019

FGCrossNet: A New Benchmark and Approach for Fine-grained Cross-media Retrieval(MM) (PDF) [Code]

3. Usage

3.1 Datasets

Graph Model--GCR

Dataset Link:

Baidu Yun Link: https://pan.baidu.com/s/1YmW8Zz2uK3AgCs6pDEoA8A?pwd=21xh
Code: 21xh

Unsupervised cross-modal real-valued

Dataset link:

Baidu Yun Link：https://pan.baidu.com/s/1hBNo8gBSyLbik0ka1POhiQ 
Code：cc53

Quantization--CDQ

Dataset Link:

Baidu Yun Link: https://pan.baidu.com/s/1mO1hdsJR2FN5xEAv2e7eaw?pwd=us9v
Code: us9v

GAN--CPAH

Dataset Link:

Baidu Yun Link: https://pan.baidu.com/s/145Zool0FUb3758EeSxtHBw?pwd=mxt7
Code: mxt7

Transformer--DCHMT

Dataset Link:

Baidu Yun Link: https://pan.baidu.com/s/1UHr2NVjFkTjLXXQ8Izy5WA?pwd=qfsj
Code: qfsj

Feature Mapping(Sample Constraint)(Label Constraint)--MDBE

Dataset Link:

Baidu Yun Link: https://pan.baidu.com/s/15BtQ_Zz7UihZBW6KXTTodA?pwd=ir7g
Code: ir7g

Feature Mapping(Sample Constraint)(Common Hamming)--RoPH

Dataset Link:

Baidu Yun Link: https://pan.baidu.com/s/1_uIulkuxcIcubvl5u3zsOA?pwd=46c4
Code: 46c4

Online learning--SHDCH

Dataset Link:

Baidu Yun Link: https://pan.baidu.com/s/1-CsIJbvz3IFsmDgYk9BwYg?pwd=7hd8
Code: 7hd8

Noise--MRL

Dataset Link:

Baidu Yun Link: https://pan.baidu.com/s/1FIrB-gXJa9VHKzLRQZf30Q?pwd=g3qt
Code: g3qt

Online learning--LEMON

Dataset Link:

Baidu Yun Link: https://pan.baidu.com/s/1s5SnnAXo5wK7cmRs3zNq4w?pwd=jxjo
Code: jxjo

Fine-grained--FGCrossNet

Dataset Link:

Baidu Yun Link: https://pan.baidu.com/s/1OYxCLmNKvPzwLIs5snTOlA?pwd=r80g
Code: r80g

Noise--DECL

Dataset Link:

Baidu Yun Link: https://pan.baidu.com/s/1FcxkwOuuiUXnIl1LAatDLA?pwd=nl2z
Code: nl2z

zzidlezz/Cross-modal-Retrieval

Cross-modal Retrieval

1. Introduction

2. Supported Methods

2.1 Unsupervised cross-modal hashing retrieval

2.1.1 Unsupervised shallow cross-modal hashing retrieval

2.1.1.1 Matrix Factorization

2017

2015

2014

2.1.1.2 Graph Theory

2018

2017

2014

2013

2011

2.1.1.3 Other Shallow

2019

2018

2015

2014

2013

2.1.1.4 Quantization

2016

2015

2.1.2 Unsupervised deep cross-modal hashing retrieval

2.1.2.1 Naive Network

2019

2018

2017

2015

2.1.2.2 GAN

2020

2019

2018

2.1.2.3 Graph Model

2022

2021

2020

2019

2.1.2.4 Knowledge Distillation

2022

2021

2020

2.2 Supervised-cross-modal-hashing-retrieval

2.2.1 Supervised shallow cross-modal hashing retrieval

2.2.1.1 Matrix Factorization

2022

2020

2019

2018

2017

2016

2.2.1.2 Dictionary Learning

2016

2014

2.2.1.3 Feature Mapping-Sample-Constraint-Label-Constraint

2022

2020

2016

2.2.1.4 Feature Mapping-Sample-Constraint-Separate-Hamming

2017

2016

2015

2.2.1.5 Feature Mapping-Sample-Constraint-Common Hamming

2021

2019

2018

2017

2016

2.2.1.6 Feature Mapping-Relation-Constraint

2017

2014

2013

2012

2.2.1.7 Other Shallow

2019

2018

2015

2012