large-vision-language-model
There are 15 repositories under large-vision-language-model topic.
BradyFU/Awesome-Multimodal-Large-Language-Models
:sparkles::sparkles:Latest Advances on Multimodal Large Language Models
PKU-YuanGroup/Video-LLaVA
【EMNLP 2024🔥】Video-LLaVA: Learning United Visual Representation by Alignment Before Projection
InternLM/InternLM-XComposer
InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions
PKU-YuanGroup/MoE-LLaVA
【TMM 2025🔥】 Mixture-of-Experts for Large Vision-Language Models
yaotingwangofficial/Awesome-MCoT
Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey
jqtangust/hawk
🔥 🔥 🔥 [NeurIPS 2024] Official Implementation of Hawk: Learning to Understand Open-World Video Anomalies
MMStar-Benchmark/MMStar
[NeurIPS 2024] This repo contains evaluation code for the paper "Are We on the Right Way for Evaluating Large Vision-Language Models"
yu-rp/apiprompting
[ECCV 2024] API: Attention Prompting on Image for Large Vision-Language Models
richard-peng-xia/CARES
[NeurIPS'24] CARES: A Comprehensive Benchmark of Trustworthiness in Medical Vision Language Models
SuperBruceJia/Awesome-Large-Vision-Language-Model
Awesome Large Vision-Language Model: A Curated List of Large Vision-Language Model
Ruiyang-061X/VL-Uncertainty
🔎Official code for our paper: "VL-Uncertainty: Detecting Hallucination in Large Vision-Language Model via Uncertainty Estimation".
ADL-X/LLAVIDAL
This is the offical repository of LLAVIDAL
ai4ce/LUWA
[CVPR 2024 Highlight] The first benchmark for lithic use-wear analysis leveraging SOTA vision and vision-language models (DINOv2, GPT-4V), demonstrating AI performance surpassing that of expert archaeologists.
lucaswychan/quant-lvlm
Easy-to-use large vision language model pipeline for quantitative analysis
amazon-science/THRONE
Code release for THRONE, a CVPR 2024 paper on measuring object hallucinations in LVLM generated text.