vision-language-learning

There are 10 repositories under vision-language-learning topic.

AIDC-AI/Ovis
A novel Multimodal Large Language Model (MLLM) architecture, designed to structurally align visual and textual embeddings.
Language:Python337 3 2714
shikiw/OPERA
[CVPR 2024 Highlight] OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allocation
Language:Python263 2 4524
RLHF-V/RLAIF-V
RLAIF-V: Aligning MLLMs through Open-Source AI Feedback for Super GPT-4V Trustworthiness
Language:Python206 4 276
YunzeMan/Situation3D
[CVPR 2024] Situational Awareness Matters in 3D Vision Language Reasoning
Language:Python17 3 31
LooperXX/ManagerTower
Code for ACL 2023 Oral Paper: ManagerTower: Aggregating the Insights of Uni-Modal Experts for Vision-Language Representation Learning
Language:Python9 2 01
SHTUPLUS/GITM-MR
The official implementation for the ICCV 2023 paper "Grounded Image Text Matching with Mismatched Relation Reasoning".
Language:Python6 3 00
yubin1219/CrossVLT
Cross-aware Early Fusion with Stage-divided Vision and Language Transformer Encoders for Referring Image Segmentation (Published in IEEE TMM 2023)
Language:Python5 2 00
lyuchenyang/Dialogue-to-Video-Retrieval
Code for ECIR 2023 paper "Dialogue-to-Video Retrieval"
Language:Python3 2 21
abhinav-neil/socratic-models
Socratic models for multimodal reasoning & image captioning
Language:Jupyter Notebook1 1 00
Ravi-Teja-konda/TunedLlavaDelights
Explore the rich flavors of Indian desserts with TunedLlavaDelights. Utilizing the in Llava fine-tuning, our project unveils detailed nutritional profiles, taste notes, and optimal consumption times for beloved sweets. Dive into a fusion of AI innovation and culinary tradition
Language:Python1 2 00

vision-language-learning

AIDC-AI/Ovis

shikiw/OPERA

RLHF-V/RLAIF-V

YunzeMan/Situation3D

LooperXX/ManagerTower

SHTUPLUS/GITM-MR

yubin1219/CrossVLT

lyuchenyang/Dialogue-to-Video-Retrieval

abhinav-neil/socratic-models

Ravi-Teja-konda/TunedLlavaDelights