image-text-retrieval

There are 35 repositories under image-text-retrieval topic.

OpenGVLab/InternVL
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
Language:Python7.4k 57 842566
salesforce/BLIP
PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
Language:Jupyter Notebook5.1k 31 206679
OFA-Sys/Chinese-CLIP
Chinese version of CLIP which achieves Chinese cross-modal retrieval and representation generation.
Language:Python5k 36 346490
Paranioar/Awesome_Matching_Pretraining_Transfering
The Paper List of Large Multi-Modality Model (Perception, Generation, Unification), Parameter-Efficient Finetuning, Vision-Language Pretraining, Conventional Image-Text Matching for Preliminary Insight.
424 13 548
slavabarkov/tidy
Offline semantic Text-to-Image and Image-to-Image search on Android powered by quantized state-of-the-art vision-language pretrained CLIP model and ONNX Runtime inference engine
Language:Kotlin413 8 3028
greyovo/PicQuery
🔍 Search local images with natural language on Android, powered by OpenAI's CLIP model. / 在 Android 上用自然语言搜索本地图片 (基于 OpenAI 的 CLIP 模型)
Language:Kotlin387 4 3144
Paranioar/SGRAF
[AAAI2021] The code of “Similarity Reasoning and Filtration for Image-Text Matching”
Language:Python213 5 1936
chuhaojin/Text2Poster-ICASSP-22
Official implementation of the ICASSP-2022 paper "Text2Poster: Laying Out Stylized Texts on Retrieved Images"
Language:Python211 4 1218
alipay/Ant-Multi-Modal-Framework
Research Code for Multimodal-Cognition Team in Ant Group
Language:Python138 4 225
howard-hou/BagFormer
PyTorch code for BagFormer: Better Cross-Modal Retrieval via bag-wise interaction
Language:Python97 23 033
X-PLUG/mPLUG
mPLUG: Effective and Efficient Vision-Language Learning by Cross-modal Skip-connections. (EMNLP 2022)
Language:Python89 2 127
hpc203/Chinese-CLIP-opencv-onnxrun
使用OpenCV+onnxruntime部署中文clip做以文搜图，给出一句话来描述想要的图片，就能从图库中搜出来符合要求的图片。包含C++和Python两个版本的程序
Language:C++70 2 614
MILVLG/rosita
ROSITA: Enhancing Vision-and-Language Semantic Alignments via Cross- and Intra-modal Knowledge Integration
Language:Python56 0 813
cobanov/image-captioning
Image captioning using python and BLIP
Language:Python47 1 310
eric-ai-lab/ComCLIP
Official implementation and dataset for the NAACL 2024 paper "ComCLIP: Training-Free Compositional Image and Text Matching"
Language:Python35 2 03
eric-ai-lab/CPL
Official implementation of our EMNLP 2022 paper "CPL: Counterfactual Prompt Learning for Vision and Language Models"
Language:Python33 3 95
Paranioar/RCAR
[TIP2023] The code of “Plug-and-Play Regulators for Image-Text Matching”
Language:Python32 1 33
ytaek-oh/fsc-clip
[EMNLP 2024] Preserving Multi-Modal Capabilities of Pre-trained VLMs for Improving Vision-Linguistic Compositionality
Language:Python15 2 00
alipay/PC2-NoiseofWeb
Noise of Web (NoW) is a challenging noisy correspondence learning (NCL) benchmark containing 100K image-text pairs for robust image-text matching/retrieval models.
Language:Python12 3 01
frank-chris/ImageTextRetrieval
In this work, we implement different cross-modal learning schemes such as Siamese Network, Correlational Network and Deep Cross-Modal Projection Learning model and study their performance. We also propose a modified Deep Cross-Modal Projection Learning model that uses a different image feature extractor. We evaluate the model’s performance on image-text retrieval on a fashion clothing dataset.
Language:Jupyter Notebook11 1 01
kaylode/tern
Cross-modal Retrieval using Transformer Encoder Reasoning Networks (TERN). With use of Metric Learning and FAISS for fast similarity search on GPU
Language:Jupyter Notebook8 1 01
Paranioar/DBL
[TIP2024] The code of “Deep Boosting Learning: A Brand-new Cooperative Approach for Image-Text Matching”
Language:Python8 1 00
marialymperaiou/knowledge-enhanced-multimodal-learning
A list of research papers on knowledge-enhanced multimodal learning
7 1 00
BUAADreamer/CCRK
[KDD 2024] Improving the Consistency in Cross-Lingual Cross-Modal Retrieval with 1-to-K Contrastive Learning
Language:Python5 1 00
ellenzhuwang/implicit_vkood
An end-to-end multimodal framework incorporating explicit knowledge graphs and OOD-detection. (NeurIPS23)
Language:Python5 1 01
Paranioar/GSSF
[TIP2024] The code of "GSSF: Generalized Structural Sparse Function for Deep Cross-modal Metric Learning"
5 2 00
LIU42/Contrastive
项目取材自 2024 年 ”泰迪杯“ 数据挖掘挑战赛 B 题，基于共享特征空间对比学习的跨模态图文互检模型
Language:Python3 1 00
mrzjy/GenshinCLIP
A simple open-sourced SigLIP model finetuned on Genshin Impact's image-text pairs.
3 1 11
Moenupa/clip-image-search
Searching Images: From Clip And Beyond
Language:Jupyter Notebook1 1 00
Paranioar/Awesome_Image_Text_Retrieval_Benchmark
The Unified Code of Image-Text Retrieval for Further Exploration.
Language:Python1 1 00
whats2000/WeiMoCIR
Training-free Zero-shot Composed Image Retrieval via Weighted Modality Fusion and Similarity (TAAI 2024)
Language:Jupyter Notebook1 1 00
AmMoPy/semantic-search-question-answer
Matching questions to correct answers using pre-trained BERT models.
Language:Jupyter Notebook0 1 00
romrawinjp/modern-image-search
Modern Image Search's course repository for Super AI Engineer Development Program SS4
Language:Jupyter Notebook0 1 01
wocns1457/CCTV-based-clothing-analysis-and-search-system
Language:Python00
jyoung105/koSigLIP
Korean version of CLIP which achieves Korean cross-modal retrieval and representation generation.
2 0

image-text-retrieval

OpenGVLab/InternVL

salesforce/BLIP

OFA-Sys/Chinese-CLIP

Paranioar/Awesome_Matching_Pretraining_Transfering

slavabarkov/tidy

greyovo/PicQuery

Paranioar/SGRAF

chuhaojin/Text2Poster-ICASSP-22

alipay/Ant-Multi-Modal-Framework

howard-hou/BagFormer

X-PLUG/mPLUG

hpc203/Chinese-CLIP-opencv-onnxrun

MILVLG/rosita

cobanov/image-captioning

eric-ai-lab/ComCLIP

eric-ai-lab/CPL

Paranioar/RCAR

ytaek-oh/fsc-clip

alipay/PC2-NoiseofWeb

frank-chris/ImageTextRetrieval

kaylode/tern

Paranioar/DBL

marialymperaiou/knowledge-enhanced-multimodal-learning

BUAADreamer/CCRK

ellenzhuwang/implicit_vkood

Paranioar/GSSF

LIU42/Contrastive

mrzjy/GenshinCLIP

Moenupa/clip-image-search

Paranioar/Awesome_Image_Text_Retrieval_Benchmark

whats2000/WeiMoCIR

AmMoPy/semantic-search-question-answer

romrawinjp/modern-image-search

wocns1457/CCTV-based-clothing-analysis-and-search-system

jyoung105/koSigLIP