vision-and-language

There are 275 repositories under vision-and-language topic.

aishwaryanr/awesome-generative-ai-guide
A one stop repository for generative AI research updates, interview resources, notebooks and much more!
18.1k 486 143.8k
salesforce/LAVIS
LAVIS - A One-stop Library for Language-Vision Intelligence
Language:Jupyter Notebook10.9k 96 6931.1k
roboflow/maestro
streamline the fine-tuning process for multimodal models: PaliGemma 2, Florence-2, and Qwen2.5-VL
Language:Python2.6k 34 44216
om-ai-lab/OmAgent
Build multimodal language agents for fast prototype and production
Language:Python2.5k 132 31281
salesforce/ALBEF
Code for ALBEF: a new vision-language pre-training method
Language:Python1.7k 11 142213
open-mmlab/Multimodal-GPT
Multimodal-GPT
Language:Python1.5k 13 20133
dandelin/ViLT
Code for the ICML 2021 (long talk) paper: "ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision"
Language:Python1.5k 14 94225
om-ai-lab/OmDet
Real-time and accurate open-vocabulary end-to-end object detection
Language:Python1.3k 68 24109
NVlabs/prismer
The implementation of "Prismer: A Vision-Language Model with Multi-Task Experts".
Language:Python1.3k 16 1973
llm-jp/awesome-japanese-llm
日本語LLMまとめ - Overview of Japanese LLMs
Language:TypeScript1.2k 29 30737
yuewang-cuhk/awesome-vision-language-pretraining-papers
Recent Advances in Vision and Language PreTrained Models (VL-PTMs)
1.2k 51 9104
rhymes-ai/Aria
Codebase for Aria - an Open Multimodal Native MoE
Language:Jupyter Notebook1.1k 20 5286
OFA-Sys/ONE-PEACE
A general representation model across vision, audio, language modalities. Paper: ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities
Language:Python1.1k 14 5773
microsoft/Oscar
Oscar and VinVL
Language:Python1k 25 202250
YehLi/xmodaler
X-modaler is a versatile and high-performance codebase for cross-modal analytics(e.g., image captioning, video captioning, vision-language pre-training, visual question answering, visual commonsense reasoning, and cross-modal retrieval).
Language:Python968 27 64106
mbzuai-oryx/groundingLMM
[CVPR 2024 🔥] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses that are seamlessly integrated with object segmentation masks.
Language:Python912 31 8450
InternRobotics/PointLLM
[ECCV 2024 Best Paper Candidate & TPAMI 2025] PointLLM: Empowering Large Language Models to Understand Point Clouds
Language:Python885 12 6744
NVlabs/DoRA
[ICML2024 (Oral)] Official PyTorch implementation of DoRA: Weight-Decomposed Low-Rank Adaptation
Language:Python852 10 2760
26hzhang/DL-NLP-Readings
My Reading Lists of Deep Learning and Natural Language Processing
Language:TeX851 80 1260
SunzeY/AlphaCLIP
[CVPR 2024] Alpha-CLIP: A CLIP Model Focusing on Wherever You Want
Language:Jupyter Notebook843 12 5955
ChenRocks/UNITER
Research code for ECCV 2020 paper "UNITER: UNiversal Image-TExt Representation Learning"
Language:Python798 17 95112
SkalskiP/top-cvpr-2025-papers
About This repository is a curated collection of the most exciting and influential CVPR 2025 papers. 🔥 [Paper + Code + Demo]
Language:Python77645
jackroos/VL-BERT
Code for ICLR 2020 paper "VL-BERT: Pre-training of Generic Visual-Linguistic Representations".
Language:Jupyter Notebook744 14 84112
SkalskiP/top-cvpr-2024-papers
This repository is a curated collection of the most exciting and influential CVPR 2024 papers. 🔥 [Paper + Code + Demo]
Language:Python738 15 358
jayleicn/ClipBERT
[CVPR 2021 Best Student Paper Honorable Mention, Oral] Official PyTorch code for ClipBERT, an efficient framework for end-to-end learning on image-text and video-text tasks.
Language:Python723 8 5987
mees/calvin
CALVIN - A benchmark for Language-Conditioned Policy Learning for Long-Horizon Robot Manipulation Tasks
Language:Python673 6 10581
SkalskiP/top-cvpr-2023-papers
This repository is a curated collection of the most exciting and influential CVPR 2023 papers. 🔥 [Paper + Code]
Language:Python654 12 163
peteanderson80/Matterport3DSimulator
AI Research Platform for Reinforcement Learning from Real Panoramic Images.
Language:C++619 19 112136
vardanagarwal/Proctoring-AI
Creating a software for automatic monitoring in online proctoring
Language:Python591 29 73339
sangminwoo/awesome-vision-and-language
A curated list of awesome vision and language resources (still under construction... stay tuned!)
547 12 341
eric-ai-lab/awesome-vision-language-navigation
A curated list for vision-and-language navigation. ACL 2022 paper "Vision-and-Language Navigation: A Survey of Tasks, Methods, and Future Directions"
540 17 123
zengyan-97/X-VLM
X-VLM: Multi-Grained Vision Language Pre-Training (ICML 2022)
Language:Python484 4 3752
JindongGu/Awesome-Prompting-on-Vision-Language-Model
This repo lists relevant papers summarized in our survey paper: A Systematic Survey of Prompt Engineering on Vision-Language Foundation Models.
482 5 036
Paranioar/Awesome_Matching_Pretraining_Transfering
The Paper List of Large Multi-Modality Model (Perception, Generation, Unification), Parameter-Efficient Finetuning, Vision-Language Pretraining, Conventional Image-Text Matching for Preliminary Insight.
428 13 549
google-research-datasets/conceptual-12m
Conceptual 12M is a dataset containing (image-URL, caption) pairs collected for vision-and-language pre-training.
401 12 820
j-min/VL-T5
PyTorch code for "Unifying Vision-and-Language Tasks via Text Generation" (ICML 2021)
Language:Python372 8 3458

vision-and-language

aishwaryanr/awesome-generative-ai-guide

salesforce/LAVIS

roboflow/maestro

om-ai-lab/OmAgent

salesforce/ALBEF

open-mmlab/Multimodal-GPT

dandelin/ViLT

om-ai-lab/OmDet

NVlabs/prismer

llm-jp/awesome-japanese-llm

yuewang-cuhk/awesome-vision-language-pretraining-papers

rhymes-ai/Aria

OFA-Sys/ONE-PEACE

microsoft/Oscar

YehLi/xmodaler

mbzuai-oryx/groundingLMM

InternRobotics/PointLLM

NVlabs/DoRA

26hzhang/DL-NLP-Readings

SunzeY/AlphaCLIP

ChenRocks/UNITER

SkalskiP/top-cvpr-2025-papers

jackroos/VL-BERT

SkalskiP/top-cvpr-2024-papers

jayleicn/ClipBERT

mees/calvin

SkalskiP/top-cvpr-2023-papers

peteanderson80/Matterport3DSimulator

vardanagarwal/Proctoring-AI

sangminwoo/awesome-vision-and-language

eric-ai-lab/awesome-vision-language-navigation

zengyan-97/X-VLM

JindongGu/Awesome-Prompting-on-Vision-Language-Model

Paranioar/Awesome_Matching_Pretraining_Transfering

google-research-datasets/conceptual-12m

j-min/VL-T5