image-text-retrieval

There are 35 repositories under image-text-retrieval topic.

  • OpenGVLab/InternVL

    [CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型

    Language:Python7.4k57842566
  • salesforce/BLIP

    PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation

    Language:Jupyter Notebook5.1k31206679
  • OFA-Sys/Chinese-CLIP

    Chinese version of CLIP which achieves Chinese cross-modal retrieval and representation generation.

    Language:Python5k36346490
  • Paranioar/Awesome_Matching_Pretraining_Transfering

    The Paper List of Large Multi-Modality Model (Perception, Generation, Unification), Parameter-Efficient Finetuning, Vision-Language Pretraining, Conventional Image-Text Matching for Preliminary Insight.

  • slavabarkov/tidy

    Offline semantic Text-to-Image and Image-to-Image search on Android powered by quantized state-of-the-art vision-language pretrained CLIP model and ONNX Runtime inference engine

    Language:Kotlin41383028
  • PicQuery

    greyovo/PicQuery

    🔍 Search local images with natural language on Android, powered by OpenAI's CLIP model. / 在 Android 上用自然语言搜索本地图片 (基于 OpenAI 的 CLIP 模型)

    Language:Kotlin38743144
  • Paranioar/SGRAF

    [AAAI2021] The code of “Similarity Reasoning and Filtration for Image-Text Matching”

    Language:Python21351936
  • chuhaojin/Text2Poster-ICASSP-22

    Official implementation of the ICASSP-2022 paper "Text2Poster: Laying Out Stylized Texts on Retrieved Images"

    Language:Python21141218
  • alipay/Ant-Multi-Modal-Framework

    Research Code for Multimodal-Cognition Team in Ant Group

    Language:Python1384225
  • howard-hou/BagFormer

    PyTorch code for BagFormer: Better Cross-Modal Retrieval via bag-wise interaction

    Language:Python9723033
  • X-PLUG/mPLUG

    mPLUG: Effective and Efficient Vision-Language Learning by Cross-modal Skip-connections. (EMNLP 2022)

    Language:Python892127
  • hpc203/Chinese-CLIP-opencv-onnxrun

    使用OpenCV+onnxruntime部署中文clip做以文搜图,给出一句话来描述想要的图片,就能从图库中搜出来符合要求的图片。包含C++和Python两个版本的程序

    Language:C++702614
  • MILVLG/rosita

    ROSITA: Enhancing Vision-and-Language Semantic Alignments via Cross- and Intra-modal Knowledge Integration

    Language:Python560813
  • image-captioning

    cobanov/image-captioning

    Image captioning using python and BLIP

    Language:Python471310
  • eric-ai-lab/ComCLIP

    Official implementation and dataset for the NAACL 2024 paper "ComCLIP: Training-Free Compositional Image and Text Matching"

    Language:Python35203
  • eric-ai-lab/CPL

    Official implementation of our EMNLP 2022 paper "CPL: Counterfactual Prompt Learning for Vision and Language Models"

    Language:Python33395
  • Paranioar/RCAR

    [TIP2023] The code of “Plug-and-Play Regulators for Image-Text Matching”

    Language:Python32133
  • ytaek-oh/fsc-clip

    [EMNLP 2024] Preserving Multi-Modal Capabilities of Pre-trained VLMs for Improving Vision-Linguistic Compositionality

    Language:Python15200
  • alipay/PC2-NoiseofWeb

    Noise of Web (NoW) is a challenging noisy correspondence learning (NCL) benchmark containing 100K image-text pairs for robust image-text matching/retrieval models.

    Language:Python12301
  • frank-chris/ImageTextRetrieval

    In this work, we implement different cross-modal learning schemes such as Siamese Network, Correlational Network and Deep Cross-Modal Projection Learning model and study their performance. We also propose a modified Deep Cross-Modal Projection Learning model that uses a different image feature extractor. We evaluate the model’s performance on image-text retrieval on a fashion clothing dataset.

    Language:Jupyter Notebook11101
  • kaylode/tern

    Cross-modal Retrieval using Transformer Encoder Reasoning Networks (TERN). With use of Metric Learning and FAISS for fast similarity search on GPU

    Language:Jupyter Notebook8101
  • Paranioar/DBL

    [TIP2024] The code of “Deep Boosting Learning: A Brand-new Cooperative Approach for Image-Text Matching”

    Language:Python8100
  • marialymperaiou/knowledge-enhanced-multimodal-learning

    A list of research papers on knowledge-enhanced multimodal learning

  • BUAADreamer/CCRK

    [KDD 2024] Improving the Consistency in Cross-Lingual Cross-Modal Retrieval with 1-to-K Contrastive Learning

    Language:Python5100
  • ellenzhuwang/implicit_vkood

    An end-to-end multimodal framework incorporating explicit knowledge graphs and OOD-detection. (NeurIPS23)

    Language:Python5101
  • Paranioar/GSSF

    [TIP2024] The code of "GSSF: Generalized Structural Sparse Function for Deep Cross-modal Metric Learning"

  • LIU42/Contrastive

    项目取材自 2024 年 ”泰迪杯“ 数据挖掘挑战赛 B 题,基于共享特征空间对比学习的跨模态图文互检模型

    Language:Python3100
  • mrzjy/GenshinCLIP

    A simple open-sourced SigLIP model finetuned on Genshin Impact's image-text pairs.

  • Moenupa/clip-image-search

    Searching Images: From Clip And Beyond

    Language:Jupyter Notebook1100
  • Paranioar/Awesome_Image_Text_Retrieval_Benchmark

    The Unified Code of Image-Text Retrieval for Further Exploration.

    Language:Python1100
  • whats2000/WeiMoCIR

    Training-free Zero-shot Composed Image Retrieval via Weighted Modality Fusion and Similarity (TAAI 2024)

    Language:Jupyter Notebook1100
  • AmMoPy/semantic-search-question-answer

    Matching questions to correct answers using pre-trained BERT models.

    Language:Jupyter Notebook0100
  • romrawinjp/modern-image-search

    Modern Image Search's course repository for Super AI Engineer Development Program SS4

    Language:Jupyter Notebook0101
  • jyoung105/koSigLIP

    Korean version of CLIP which achieves Korean cross-modal retrieval and representation generation.