large-vision-language-models

There are 40 repositories under large-vision-language-models topic.

BradyFU/Awesome-Multimodal-Large-Language-Models
:sparkles::sparkles:Latest Advances on Multimodal Large Language Models
16.3k 282 1481.1k
ShareGPT4Omni/ShareGPT4Video
[NeurIPS 2024] An official implementation of "ShareGPT4Video: Improving Video Understanding and Generation with Better Captions"
Language:Python1.1k 23 4342
NVlabs/DoRA
[ICML2024 (Oral)] Official PyTorch implementation of DoRA: Weight-Decomposed Low-Rank Adaptation
Language:Python852 10 2760
YingqingHe/Awesome-LLMs-meet-Multimodal-Generation
🔥🔥🔥 A curated list of papers on LLMs-based multimodal generation (image, video, 3D and audio).
Language:HTML509 15 530
BradyFU/Video-MME
✨✨[CVPR 2025] Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis
501 5 3620
Paranioar/Awesome_Matching_Pretraining_Transfering
The Paper List of Large Multi-Modality Model (Perception, Generation, Unification), Parameter-Efficient Finetuning, Vision-Language Pretraining, Conventional Image-Text Matching for Preliminary Insight.
428 13 549
burglarhobbit/Awesome-Medical-Large-Language-Models
Curated papers on Large Language Models in Healthcare and Medical domain
353 10 243
tianyi-lab/HallusionBench
[CVPR'24] HallusionBench: You See What You Think? Or You Think What You See? An Image-Context Reasoning Benchmark Challenging for GPT-4V(ision), LLaVA-1.5, and Other Multi-modality Models
Language:Python297 5 138
ShareGPT4Omni/ShareGPT4V
[ECCV 2024] ShareGPT4V: Improving Large Multi-modal Models with Better Captions
Language:Python233 3 197
khuangaf/Awesome-Chart-Understanding
A curated list of recent and past chart understanding work based on our IEEE TKDE survey paper: From Pixels to Insights: A Survey on Automatic Chart Understanding in the Era of Large Foundation Models.
196 6 519
MMStar-Benchmark/MMStar
[NeurIPS 2024] This repo contains evaluation code for the paper "Are We on the Right Way for Evaluating Large Vision-Language Models"
Language:Python194 1 135
NishilBalar/Awesome-LVLM-Hallucination
up-to-date curated list of state-of-the-art Large vision language models hallucinations research work, papers & resources
167 3 18
mbzuai-oryx/GeoPixel
GeoPixel: A Pixel Grounding Large Multimodal Model for Remote Sensing is specifically developed for high-resolution remote sensing image analysis, offering advanced multi-target pixel grounding capabilities.
Language:Python114 7 315
llmbev/talk2bev
Talk2BEV: Language-Enhanced Bird's Eye View Maps (ICRA'24)
Language:Python109 2 910
yu-rp/apiprompting
[ECCV 2024] API: Attention Prompting on Image for Large Vision-Language Models
Language:Python102 1 106
yfzhang114/LLaVA-Align
This is the official repo for Debiasing Large Visual Language Models, including a Post-Hoc debias method and Visual Debias Decoding strategy.
Language:Python77 1 92
ys-zong/VLGuard
[ICML 2024] Safety Fine-Tuning at (Almost) No Cost: A Baseline for Vision Large Language Models.
Language:Python62 3 92
Ruiyang-061X/Awesome-MLLM-Uncertainty
✨A curated list of papers on the uncertainty in multi-modal large language model (MLLM).
53 4 00
FudanDISC/ReForm-Eval
An benchmark for evaluating the capabilities of large vision-language models (LVLMs)
Language:Python46 1 84
SuperBruceJia/Awesome-Mixture-of-Experts
Awesome Mixture of Experts (MoE): A Curated List of Mixture of Experts (MoE) and Mixture of Multimodal Experts (MoME)
39 1 02
The-Martyr/CausalMM
[ICLR 2025] Mitigating Modality Prior-Induced Hallucinations in Multimodal Large Language Models via Deciphering Attention Causality
Language:Python39 2 83
SuperBruceJia/Awesome-Large-Vision-Language-Model
Awesome Large Vision-Language Model: A Curated List of Large Vision-Language Model
36 1 23
sakura2233565548/TabPedia
This repository is the codebase of TabPedia: Towards Comprehensive Visual Table Understanding with Concept Synergy
Language:Python28 1 71
sled-group/moh
[NeurIPS 2024] Official Repository of Multi-Object Hallucination in Vision-Language Models
Language:Python28 7 11
khuangaf/CHOCOLATE
Code and data for the ACL 2024 Findings paper "Do LVLMs Understand Charts? Analyzing and Correcting Factual Errors in Chart Captioning"
Language:Jupyter Notebook25 3 11
SkiddieAhn/Paper-AnyAnomaly
PyTorch Implementation of the Paper 'AnyAnomaly': Official Version
Language:Python254
bowen-upenn/Multi-Agent-VQA
[CVPR 2024 CVinW] Multi-Agent VQA: Exploring Multi-Agent Foundation Models on Zero-Shot Visual Question Answering
Language:Python17 2 00
NKU-MetautoAI/awesome-large-vision-language-models
Advances in recent large vision language models (LVLMs)
13 2 00
Wu-Zongyu/LanP
Official Implementation of 'Lanp: Rethinking the Impact of Language Priors in Large Vision-Language Models'
Language:Python13 1 00
The-Martyr/Awesome-Modality-Priors-in-MLLMs
Latest Advances on Modality Priors in Multimodal Large Language Models
12 2 01
andy9705/SumGD
[NAACL 2025 Findings] Mitigating Hallucinations in Large Vision-Language Models via Summary-Guided Decoding
Language:Python110
ShareGPT4Omni/ShareGPT4Omni
ShareGPT4Omni: Towards Building Omni Large Multi-modal Models with Comprehensive Multi-modal Annotations
8 2 00
gaotiexinqu/V2P-Bench
V2P-Bench: Evaluating Video-Language Understanding with Visual Prompts for Better Human-Model Interaction
5
CristianoPatricio/CBVLM
Code for the paper "CBVLM: Training-free Explainable Concept-based Large Vision Language Models for Medical Image Classification".
Language:Python4 1 01
camilochs/visgraphvar
VisGraphVar: A Benchmark Generator for Assessing Variability in Graph Analysis Using Large Vision-Language Models
Language:Python2 1 00
SHTUPLUS/ICCC_CVPR2024
Official Implementation for Learning by Correction: Efficient Tuning Task for Zero-Shot Generative Vision-Language Reasoning (CVPR 2024).
Language:Python2 3 11

large-vision-language-models

BradyFU/Awesome-Multimodal-Large-Language-Models

ShareGPT4Omni/ShareGPT4Video

NVlabs/DoRA

YingqingHe/Awesome-LLMs-meet-Multimodal-Generation

BradyFU/Video-MME

Paranioar/Awesome_Matching_Pretraining_Transfering

burglarhobbit/Awesome-Medical-Large-Language-Models

tianyi-lab/HallusionBench

ShareGPT4Omni/ShareGPT4V

khuangaf/Awesome-Chart-Understanding

MMStar-Benchmark/MMStar

NishilBalar/Awesome-LVLM-Hallucination

mbzuai-oryx/GeoPixel

llmbev/talk2bev

yu-rp/apiprompting

yfzhang114/LLaVA-Align

ys-zong/VLGuard

Ruiyang-061X/Awesome-MLLM-Uncertainty

FudanDISC/ReForm-Eval

SuperBruceJia/Awesome-Mixture-of-Experts

The-Martyr/CausalMM

SuperBruceJia/Awesome-Large-Vision-Language-Model

sakura2233565548/TabPedia

sled-group/moh

khuangaf/CHOCOLATE

SkiddieAhn/Paper-AnyAnomaly

bowen-upenn/Multi-Agent-VQA

NKU-MetautoAI/awesome-large-vision-language-models

Wu-Zongyu/LanP

The-Martyr/Awesome-Modality-Priors-in-MLLMs

andy9705/SumGD

ShareGPT4Omni/ShareGPT4Omni

gaotiexinqu/V2P-Bench

CristianoPatricio/CBVLM

camilochs/visgraphvar

SHTUPLUS/ICCC_CVPR2024