vision-language-pretraining

There are 31 repositories under vision-language-pretraining topic.

salesforce/LAVIS
LAVIS - A One-stop Library for Language-Vision Intelligence
Language:Jupyter Notebook9.6k 97 643938
DAMO-NLP-SG/Video-LLaMA
[EMNLP 2023 Demo] Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding
Language:Python2.7k 32 155242
deepseek-ai/DeepSeek-VL
DeepSeek-VL: Towards Real-World Vision-Language Understanding
Language:Python2k 19 46187
mbzuai-oryx/Video-ChatGPT
[ACL 2024 🔥] Video-ChatGPT is a video conversation model capable of generating meaningful conversation about videos. It combines the capabilities of LLMs with a pretrained visual encoder adapted for spatiotemporal video representation. We also introduce a rigorous 'Quantitative Evaluation Benchmarking' for video-based conversational models.
Language:Python1.1k 15 11997
Sense-GVT/DeCLIP
Supervision Exists Everywhere: A Data Efficient Contrastive Language-Image Pre-training Paradigm
Language:Python623 19 2931
TXH-mercury/VALOR
Codes and Models for VALOR: Vision-Audio-Language Omni-Perception Pretraining Model and Dataset
Language:Python253 9 2215
mbzuai-oryx/VideoGPT-plus
Official Repository of paper VideoGPT+: Integrating Image and Video Encoders for Enhanced Video Understanding
Language:Python181 5 2311
sail-sg/ptp
[CVPR2023] The code for 《Position-guided Text Prompt for Vision-Language Pre-training》
Language:Python148 7 104
Surrey-UP-Lab/RegionSpot
Recognize Any Regions
Language:Python116 1 154
ArrowLuo/SegCLIP
PyTorch implementation of ICML 2023 paper "SegCLIP: Patch Aggregation with Learnable Centers for Open-Vocabulary Semantic Segmentation"
Language:Python77 9 58
vgthengane/Continual-CLIP
Official repository for "CLIP model is an Efficient Continual Learner".
Language:Python72 5 52
jusiro/FLAIR
FLAIR: A Foundation LAnguage-Image model of the Retina for fundus image understanding.
Language:Python71 2 47
marslanm/Multimodality-Representation-Learning
This repository provides a comprehensive collection of research papers focused on multimodal representation learning, all of which have been cited and discussed in the survey just accepted https://dl.acm.org/doi/abs/10.1145/3617833 .
65 7 07
megvii-research/protoclip
📍 Official pytorch implementation of paper "ProtoCLIP: Prototypical Contrastive Language Image Pretraining" (IEEE TNNLS)
Language:Python45 8 00
Zoky-2020/SGA
Set-level Guidance Attack: Boosting Adversarial Transferability of Vision-Language Pre-training Models. [ICCV 2023 Oral]
Language:Python44 4 102
HieuPhan33/CVPR2024_MAVL
Multi-Aspect Vision Language Pretraining - CVPR2024
Language:Python43 1 20
TXH-mercury/COSA
Codes and Models for COSA: Concatenated Sample Pretrained Vision-Language Foundation Model
Language:Python37 2 42
TencentARC/FLM
Accelerating Vision-Language Pretraining with Free Language Modeling (CVPR 2023)
Language:Python31 6 01
yiren-jian/BLIText
[NeurIPS 2023] Bootstrapping Vision-Language Learning with Decoupled Language Pre-training
Language:Python23 3 51
alinlab/b2t
Bias-to-Text: Debiasing Unknown Visual Biases through Language Interpretation
Language:Python22 3 21
omipan/svl_adapter
SVL-Adapter: Self-Supervised Adapter for Vision-Language Pretrained Models
Language:Python20 2 23
adarobustness/adaptation_robustness
Evaluate robustness of adaptation methods on large vision-language models
Language:Shell15 1 10
ChenDelong1999/ITRA
A codebase for flexible and efficient Image Text Representation Alignment
Language:Python13 2 41
unitaryai/VTC
VTC: Improving Video-Text Retrieval with User Comments
Language:Python11 3 00
jaisidhsingh/LoRA-CLIP
Easy wrapper for inserting LoRA layers in CLIP.
Language:Python101
ahmdtaha/distributed_sigmoid_loss
Unofficial implementation for Sigmoid Loss for Language Image Pre-Training
Language:Python9 1 10
LooperXX/ManagerTower
Code for ACL 2023 Oral Paper: ManagerTower: Aggregating the Insights of Uni-Modal Experts for Vision-Language Representation Learning
Language:Python9 2 01
YyzHarry/vlm-fairness
Demographic Bias of Vision-Language Foundation Models in Medical Imaging
Language:Python9 1 02
BUAADreamer/CCRK
[KDD 2024] Improving the Consistency in Cross-Lingual Cross-Modal Retrieval with 1-to-K Contrastive Learning
Language:Python3 2 00
xmed-lab/FD-SOS
MICCAI 2024: Vision-Language Open-Set Detectors for Bone Fenestration and Dehiscence Detection from Intraoral Images
Language:Python3 2 00
unitaryai/VTC-dataset
Language:Python0 2 10

vision-language-pretraining

salesforce/LAVIS

DAMO-NLP-SG/Video-LLaMA

deepseek-ai/DeepSeek-VL

mbzuai-oryx/Video-ChatGPT

Sense-GVT/DeCLIP

TXH-mercury/VALOR

mbzuai-oryx/VideoGPT-plus

sail-sg/ptp

Surrey-UP-Lab/RegionSpot

ArrowLuo/SegCLIP

vgthengane/Continual-CLIP

jusiro/FLAIR

marslanm/Multimodality-Representation-Learning

megvii-research/protoclip

Zoky-2020/SGA

HieuPhan33/CVPR2024_MAVL

TXH-mercury/COSA

TencentARC/FLM

yiren-jian/BLIText

alinlab/b2t

omipan/svl_adapter

adarobustness/adaptation_robustness

ChenDelong1999/ITRA

unitaryai/VTC

jaisidhsingh/LoRA-CLIP

ahmdtaha/distributed_sigmoid_loss

LooperXX/ManagerTower

YyzHarry/vlm-fairness

BUAADreamer/CCRK

xmed-lab/FD-SOS

unitaryai/VTC-dataset