cross-modal-retrieval

There are 80 repositories under cross-modal-retrieval topic.

jina-ai/clip-as-service
🏄 Scalable embedding, reasoning, ranking for images and sentences with CLIP
Language:Python12.5k 222 6112.1k
YehLi/xmodaler
X-modaler is a versatile and high-performance codebase for cross-modal analytics(e.g., image captioning, video captioning, vision-language pre-training, visual question answering, visual commonsense reasoning, and cross-modal retrieval).
Language:Python1k 36 62111
Paranioar/Awesome_Matching_Pretraining_Transfering
The Paper List of Large Multi-Modality Model (Perception, Generation, Unification), Parameter-Efficient Finetuning, Vision-Language Pretraining, Conventional Image-Text Matching for Preliminary Insight.
406 12 547
zjukg/KG-MM-Survey
Knowledge Graphs Meet Multi-Modal Learning: A Comprehensive Survey
361 7 018
layumi/Image-Text-Embedding
TOMM2020 Dual-Path Convolutional Image-Text Embedding :feet: https://arxiv.org/abs/1711.05535
Language:MATLAB287 12 1873
slavabarkov/tidy
Offline semantic Text-to-Image and Image-to-Image search on Android powered by quantized state-of-the-art vision-language pretrained CLIP model and ONNX Runtime inference engine
Language:Kotlin250 7 2520
Paranioar/SGRAF
[AAAI2021] The code of “Similarity Reasoning and Filtration for Image-Text Matching”
Language:Python214 5 1936
woodfrog/vse_infty
Code for "Learning the Best Pooling Strategy for Visual Semantic Embedding", CVPR 2021 (Oral)
Language:Python156 4 1016
penghu-cs/DSCMR
Deep Supervised Cross-modal Retrieval (CVPR 2019, PyTorch Code)
Language:Python141 5 1226
yalesong/pvse
Polysemous Visual-Semantic Embedding for Cross-Modal Retrieval (CVPR 2019)
Language:Python134 4 1824
naver-ai/pcme
Official Pytorch implementation of "Probabilistic Cross-Modal Embedding" (CVPR 2021)
Language:Python126 4 1017
jpthu17/DiffusionRet
[ICCV 2023] DiffusionRet: Generative Text-Video Retrieval with Diffusion Model
Language:Python125 3 106
jpthu17/EMCL
[NeurIPS 2022 Spotlight] Expectation-Maximization Contrastive Learning for Compact Video-and-Language Representations
Language:Python124 3 49
howard-hou/BagFormer
PyTorch code for BagFormer: Better Cross-Modal Retrieval via bag-wise interaction
Language:Python115 30 033
ilaria-manco/muscall
Official implementation of "Contrastive Audio-Language Learning for Music" (ISMIR 2022)
Language:Python109 7 411
jpthu17/HBI
[CVPR 2023 Highlight & TPAMI] Video-Text as Game Players: Hierarchical Banzhaf Interaction for Cross-Modal Representation Learning
Language:Python109 4 95
AyanKumarBhunia/on-the-fly-FGSBIR
[CVPR 2020, Oral] "Sketch Less for More: On-the-Fly Fine-Grained Sketch Based Image Retrieval”, IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2020. .
Language:Python57 4 916
naver-ai/eccv-caption
Extended COCO Validation (ECCV) Caption dataset (ECCV 2022)
Language:Python56 2 42
naver-ai/pcmepp
Official Pytorch implementation of "Improved Probabilistic Image-Text Representations" (ICLR 2024)
Language:Python53 3 91
penghu-cs/UCCH
Unsupervised Contrastive Cross-modal Hashing (IEEE TPAMI 2023, PyTorch Code)
Language:Python52 3 2510
penghu-cs/MRL
Learning Cross-Modal Retrieval with Noisy Labels (CVPR 2021, PyTorch Code)
Language:Python51 3 109
jpthu17/DiCoSA
[IJCAI 2023] Text-Video Retrieval with Disentangled Conceptualization and Set-to-Set Alignment
Language:Python49 2 102
ailab-kyunghee/CM2_DVC
[CVPR 2024] Do you remember? Dense Video Captioning with Cross-Modal Memory Retrieval
Language:Python48 1 72
BrandonHanx/TextReID
[BMVC 2021] Text-Based Person Search with Limited Data
Language:Python44 2 135
mako443/Text2Pos-CVPR2022
Code, dataset and models for our CVPR 2022 publication "Text2Pos"
Language:Python41 3 127
LivXue/GNN4CMR
PyTorch implementation of the AAAI-21 paper "Dual Adversarial Label-aware Graph Neural Networks for Cross-modal Retrieval" and the TPAMI-22 paper "Integrating Multi-Label Contrastive Learning with Dual Adversarial Graph Neural Networks for Cross-Modal Retrieval".
Language:Python38 2 04
WendellGul/AGAH
Source code for paper "Adversary Guided Asymmetric Hashing for Cross-Modal Retrieval".
Language:Python36 2 711
penghu-cs/SDML
Scalable deep multimodal learning for cross-modal retrieval (SIGIR 2019, PyTorch Code)
Language:Python33 2 313
knightyxp/DGL
[AAAI 2024] DGL: Dynamic Global-Local Prompt Tuning for Text-Video Retrieval.
Language:Python32 1 41
kyuyeonpooh/objects-that-sound
The unofficial implementation of paper, "Objects that Sound", from ECCV 2018.
Language:Python32 5 34
penghu-cs/MAN
Multimodal Adversarial Network for Cross-modal Retrieval (PyTorch Code)
Language:Python30 2 26
Paranioar/RCAR
[TIP2023] The code of “Plug-and-Play Regulators for Image-Text Matching”
Language:Python29 1 13
idealwhite/VLDeformer
Pytorch implement of the paper "VLDeformer: Vision Language Decomposed Transformer for Fast Cross-modal Retrieval", KBS 2022
Language:Jupyter Notebook26 2 04
xiaoyuan1996/SemanticLocalizationMetrics
The first research for semantic localization
Language:Python26 3 45
MartinYuanNJU/SEMScene
Code implementation of paper "SEMScene: Semantic-Consistency Enhanced Multi-Level Scene Graph Matching for Image-Text Retrieval".
Language:Python25 1 21
ict-bigdatalab/VNEL
Dataset and code for EMNLP 2022 "Visual Named Entity Linking: A New Dataset and A Baseline"
24 1 22

cross-modal-retrieval

jina-ai/clip-as-service

YehLi/xmodaler

Paranioar/Awesome_Matching_Pretraining_Transfering

zjukg/KG-MM-Survey

layumi/Image-Text-Embedding

slavabarkov/tidy

Paranioar/SGRAF

woodfrog/vse_infty

penghu-cs/DSCMR

yalesong/pvse

naver-ai/pcme

jpthu17/DiffusionRet

jpthu17/EMCL

howard-hou/BagFormer

ilaria-manco/muscall

jpthu17/HBI

AyanKumarBhunia/on-the-fly-FGSBIR

naver-ai/eccv-caption

naver-ai/pcmepp

penghu-cs/UCCH

penghu-cs/MRL

jpthu17/DiCoSA

ailab-kyunghee/CM2_DVC

BrandonHanx/TextReID

mako443/Text2Pos-CVPR2022

LivXue/GNN4CMR

WendellGul/AGAH

penghu-cs/SDML

knightyxp/DGL

kyuyeonpooh/objects-that-sound

penghu-cs/MAN

Paranioar/RCAR

idealwhite/VLDeformer

xiaoyuan1996/SemanticLocalizationMetrics

MartinYuanNJU/SEMScene

ict-bigdatalab/VNEL