vision-language-pretraining
There are 31 repositories under vision-language-pretraining topic.
salesforce/LAVIS
LAVIS - A One-stop Library for Language-Vision Intelligence
DAMO-NLP-SG/Video-LLaMA
[EMNLP 2023 Demo] Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding
deepseek-ai/DeepSeek-VL
DeepSeek-VL: Towards Real-World Vision-Language Understanding
mbzuai-oryx/Video-ChatGPT
[ACL 2024 🔥] Video-ChatGPT is a video conversation model capable of generating meaningful conversation about videos. It combines the capabilities of LLMs with a pretrained visual encoder adapted for spatiotemporal video representation. We also introduce a rigorous 'Quantitative Evaluation Benchmarking' for video-based conversational models.
Sense-GVT/DeCLIP
Supervision Exists Everywhere: A Data Efficient Contrastive Language-Image Pre-training Paradigm
TXH-mercury/VALOR
Codes and Models for VALOR: Vision-Audio-Language Omni-Perception Pretraining Model and Dataset
mbzuai-oryx/VideoGPT-plus
Official Repository of paper VideoGPT+: Integrating Image and Video Encoders for Enhanced Video Understanding
sail-sg/ptp
[CVPR2023] The code for 《Position-guided Text Prompt for Vision-Language Pre-training》
Surrey-UP-Lab/RegionSpot
Recognize Any Regions
ArrowLuo/SegCLIP
PyTorch implementation of ICML 2023 paper "SegCLIP: Patch Aggregation with Learnable Centers for Open-Vocabulary Semantic Segmentation"
vgthengane/Continual-CLIP
Official repository for "CLIP model is an Efficient Continual Learner".
jusiro/FLAIR
FLAIR: A Foundation LAnguage-Image model of the Retina for fundus image understanding.
marslanm/Multimodality-Representation-Learning
This repository provides a comprehensive collection of research papers focused on multimodal representation learning, all of which have been cited and discussed in the survey just accepted https://dl.acm.org/doi/abs/10.1145/3617833 .
megvii-research/protoclip
📍 Official pytorch implementation of paper "ProtoCLIP: Prototypical Contrastive Language Image Pretraining" (IEEE TNNLS)
Zoky-2020/SGA
Set-level Guidance Attack: Boosting Adversarial Transferability of Vision-Language Pre-training Models. [ICCV 2023 Oral]
HieuPhan33/CVPR2024_MAVL
Multi-Aspect Vision Language Pretraining - CVPR2024
TXH-mercury/COSA
Codes and Models for COSA: Concatenated Sample Pretrained Vision-Language Foundation Model
TencentARC/FLM
Accelerating Vision-Language Pretraining with Free Language Modeling (CVPR 2023)
yiren-jian/BLIText
[NeurIPS 2023] Bootstrapping Vision-Language Learning with Decoupled Language Pre-training
alinlab/b2t
Bias-to-Text: Debiasing Unknown Visual Biases through Language Interpretation
omipan/svl_adapter
SVL-Adapter: Self-Supervised Adapter for Vision-Language Pretrained Models
adarobustness/adaptation_robustness
Evaluate robustness of adaptation methods on large vision-language models
ChenDelong1999/ITRA
A codebase for flexible and efficient Image Text Representation Alignment
unitaryai/VTC
VTC: Improving Video-Text Retrieval with User Comments
jaisidhsingh/LoRA-CLIP
Easy wrapper for inserting LoRA layers in CLIP.
ahmdtaha/distributed_sigmoid_loss
Unofficial implementation for Sigmoid Loss for Language Image Pre-Training
LooperXX/ManagerTower
Code for ACL 2023 Oral Paper: ManagerTower: Aggregating the Insights of Uni-Modal Experts for Vision-Language Representation Learning
YyzHarry/vlm-fairness
Demographic Bias of Vision-Language Foundation Models in Medical Imaging
BUAADreamer/CCRK
[KDD 2024] Improving the Consistency in Cross-Lingual Cross-Modal Retrieval with 1-to-K Contrastive Learning
xmed-lab/FD-SOS
MICCAI 2024: Vision-Language Open-Set Detectors for Bone Fenestration and Dehiscence Detection from Intraoral Images