cross-modal-retrieval

There are 80 repositories under cross-modal-retrieval topic.

  • clip-as-service

    jina-ai/clip-as-service

    🏄 Scalable embedding, reasoning, ranking for images and sentences with CLIP

    Language:Python12.5k2226112.1k
  • YehLi/xmodaler

    X-modaler is a versatile and high-performance codebase for cross-modal analytics(e.g., image captioning, video captioning, vision-language pre-training, visual question answering, visual commonsense reasoning, and cross-modal retrieval).

    Language:Python1k3662111
  • Paranioar/Awesome_Matching_Pretraining_Transfering

    The Paper List of Large Multi-Modality Model (Perception, Generation, Unification), Parameter-Efficient Finetuning, Vision-Language Pretraining, Conventional Image-Text Matching for Preliminary Insight.

  • zjukg/KG-MM-Survey

    Knowledge Graphs Meet Multi-Modal Learning: A Comprehensive Survey

  • Image-Text-Embedding

    layumi/Image-Text-Embedding

    TOMM2020 Dual-Path Convolutional Image-Text Embedding :feet: https://arxiv.org/abs/1711.05535

    Language:MATLAB287121873
  • slavabarkov/tidy

    Offline semantic Text-to-Image and Image-to-Image search on Android powered by quantized state-of-the-art vision-language pretrained CLIP model and ONNX Runtime inference engine

    Language:Kotlin25072520
  • Paranioar/SGRAF

    [AAAI2021] The code of “Similarity Reasoning and Filtration for Image-Text Matching”

    Language:Python21451936
  • woodfrog/vse_infty

    Code for "Learning the Best Pooling Strategy for Visual Semantic Embedding", CVPR 2021 (Oral)

    Language:Python15641016
  • penghu-cs/DSCMR

    Deep Supervised Cross-modal Retrieval (CVPR 2019, PyTorch Code)

    Language:Python14151226
  • yalesong/pvse

    Polysemous Visual-Semantic Embedding for Cross-Modal Retrieval (CVPR 2019)

    Language:Python13441824
  • naver-ai/pcme

    Official Pytorch implementation of "Probabilistic Cross-Modal Embedding" (CVPR 2021)

    Language:Python12641017
  • jpthu17/DiffusionRet

    [ICCV 2023] DiffusionRet: Generative Text-Video Retrieval with Diffusion Model

    Language:Python1253106
  • jpthu17/EMCL

    [NeurIPS 2022 Spotlight] Expectation-Maximization Contrastive Learning for Compact Video-and-Language Representations

    Language:Python124349
  • howard-hou/BagFormer

    PyTorch code for BagFormer: Better Cross-Modal Retrieval via bag-wise interaction

    Language:Python11530033
  • ilaria-manco/muscall

    Official implementation of "Contrastive Audio-Language Learning for Music" (ISMIR 2022)

    Language:Python1097411
  • jpthu17/HBI

    [CVPR 2023 Highlight & TPAMI] Video-Text as Game Players: Hierarchical Banzhaf Interaction for Cross-Modal Representation Learning

    Language:Python109495
  • AyanKumarBhunia/on-the-fly-FGSBIR

    [CVPR 2020, Oral] "Sketch Less for More: On-the-Fly Fine-Grained Sketch Based Image Retrieval”, IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2020. .

    Language:Python574916
  • naver-ai/eccv-caption

    Extended COCO Validation (ECCV) Caption dataset (ECCV 2022)

    Language:Python56242
  • naver-ai/pcmepp

    Official Pytorch implementation of "Improved Probabilistic Image-Text Representations" (ICLR 2024)

    Language:Python53391
  • penghu-cs/UCCH

    Unsupervised Contrastive Cross-modal Hashing (IEEE TPAMI 2023, PyTorch Code)

    Language:Python5232510
  • penghu-cs/MRL

    Learning Cross-Modal Retrieval with Noisy Labels (CVPR 2021, PyTorch Code)

    Language:Python513109
  • jpthu17/DiCoSA

    [IJCAI 2023] Text-Video Retrieval with Disentangled Conceptualization and Set-to-Set Alignment

    Language:Python492102
  • ailab-kyunghee/CM2_DVC

    [CVPR 2024] Do you remember? Dense Video Captioning with Cross-Modal Memory Retrieval

    Language:Python48172
  • BrandonHanx/TextReID

    [BMVC 2021] Text-Based Person Search with Limited Data

    Language:Python442135
  • mako443/Text2Pos-CVPR2022

    Code, dataset and models for our CVPR 2022 publication "Text2Pos"

    Language:Python413127
  • LivXue/GNN4CMR

    PyTorch implementation of the AAAI-21 paper "Dual Adversarial Label-aware Graph Neural Networks for Cross-modal Retrieval" and the TPAMI-22 paper "Integrating Multi-Label Contrastive Learning with Dual Adversarial Graph Neural Networks for Cross-Modal Retrieval".

    Language:Python38204
  • WendellGul/AGAH

    Source code for paper "Adversary Guided Asymmetric Hashing for Cross-Modal Retrieval".

    Language:Python362711
  • penghu-cs/SDML

    Scalable deep multimodal learning for cross-modal retrieval (SIGIR 2019, PyTorch Code)

    Language:Python332313
  • knightyxp/DGL

    [AAAI 2024] DGL: Dynamic Global-Local Prompt Tuning for Text-Video Retrieval.

    Language:Python32141
  • kyuyeonpooh/objects-that-sound

    The unofficial implementation of paper, "Objects that Sound", from ECCV 2018.

    Language:Python32534
  • penghu-cs/MAN

    Multimodal Adversarial Network for Cross-modal Retrieval (PyTorch Code)

    Language:Python30226
  • Paranioar/RCAR

    [TIP2023] The code of “Plug-and-Play Regulators for Image-Text Matching”

    Language:Python29113
  • idealwhite/VLDeformer

    Pytorch implement of the paper "VLDeformer: Vision Language Decomposed Transformer for Fast Cross-modal Retrieval", KBS 2022

    Language:Jupyter Notebook26204
  • xiaoyuan1996/SemanticLocalizationMetrics

    The first research for semantic localization

    Language:Python26345
  • MartinYuanNJU/SEMScene

    Code implementation of paper "SEMScene: Semantic-Consistency Enhanced Multi-Level Scene Graph Matching for Image-Text Retrieval".

    Language:Python25121
  • ict-bigdatalab/VNEL

    Dataset and code for EMNLP 2022 "Visual Named Entity Linking: A New Dataset and A Baseline"