XIANGLIU03's Stars
scvready123/IterWeGO
This is the implementation of our paper, "Leveraging Weak Cross-Modal Guidance for Coherence Modelling via Iterative Learning".
mlfoundations/open_flamingo
An open-source framework for training large multimodal models.
liyongqi67/GRACE
FlyCuteBird/MKTLON
The source code of MKTLON
microsoft/BridgeTower
Open source code for AAAI 2023 Paper "BridgeTower: Building Bridges Between Encoders in Vision-Language Representation Learning"
cluel01/clip-branches
96-Zachary/vse_2ad
AAA-Zheng/Listwise_ITR
Official PyTorch implementation of the paper "Integrating Listwise Ranking into Pairwise-based Image-Text Retrieval"
AAA-Zheng/LG_ITM
Official PyTorch implementation of the paper "Integrating Language Guidance into Image-Text Matching for Correcting False Negatives"
facebookresearch/flip
Official Open Source code for "Scaling Language-Image Pre-training via Masking"
Mario0716/SCCMR-master
Soft Contrastive Cross-Modal Retrieval(Pytorch Code)
vkhoi/cora_cvpr24
HuiChen24/IMRAM
code for our CVPR2020 paper "IMRAM: Iterative Matching with Recurrent Attention Memory for Cross-Modal Image-Text Retrieval"
RustamyF/clip-multimodal-ml
McGill-NLP/diffusion-itm
Code and data setup for the paper "Are Diffusion Models Vision-and-language Reasoners?"
mesnico/ALADIN
Official implementation of the paper "ALADIN: Distilling Fine-grained Alignment Scores for Efficient Image-Text Matching and Retrieval"
Zjamie813/SelfAlign
winycg/CLIP-KD
[CVPR-2024] Official implementations of CLIP-KD: An Empirical Study of CLIP Model Distillation
winycg/MCL
[AAAI-2022 Oral] Official implementations of MCL: Mutual Contrastive Learning for Visual Representation Learning
Paranioar/SGRAF
[AAAI2021] The code of “Similarity Reasoning and Filtration for Image-Text Matching”
openai/CLIP
CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image
Yuting-Gao/PyramidCLIP
Implementation of PyramidCLIP(NeurIPS2022).
yzhuoning/Awesome-CLIP
Awesome list for research on CLIP (Contrastive Language-Image Pre-Training).
StanfordMIMI/villa
ViLLA: Fine-grained vision-language representation learning from real-world data
Wangt-CN/Code_CASC
BruceW91/CVSE
The official source code for the paper Consensus-Aware Visual-Semantic Embedding for Image-Text Matching (ECCV 2020)
liuyyy111/ConVSE
PyTorch source code for "Regularizing Visual Semantic Embedding with Contrastive Learning for Image-Text Matching"
CrossmodalGroup/CMCAN
Implementation of our AAAI2022 paper, Show Your Faith: Cross-Modal Confidence-Aware Network for Image-Text Matching.
CrossmodalGroup/ESL
zengyan-97/X2-VLM
All-In-One VLM: Image + Video + Transfer to Other Languages / Domains (TPAMI 2023)