/Awesome-LVLM-paper

:sunglasses: List of papers about Large Multimodal model

😎 Awesome-LVLMs

Related Collection

Our Paper Reading List

Folder Description
LVLM Model Large multimodal models
LVLM Agent Agent & Application of LVLM
LVLM Hallucination Benchmark & Methods for Hallucination

🏗️ LVLM Models

Title Venue/Date Note Code Demo Picture
Star
InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning
NeurIPS 2023 InstructBLIP Github Local Demo instrucblip
Star
Visual Instruction Tuning
NeurIPS 2023 LLaVA GitHub Demo llava
Star
LLaMA-Adapter V2: Parameter-Efficient Visual Instruction Model
2023-04 LLaMA Adapter v2 Github Demo llama
Star
mPLUG-Owl: Modularization Empowers Large Language Models with Multimodality
2023-04 mPLUG Github Demo m-plug
Star
MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models
2023-04 MiniGPT-4 Github - minigpt-4
Star
TextBind: Multi-turn Interleaved Multimodal Instruction-following
2023-09 TextBind Github Demo textbind
Star
InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning
2023-09 BLIP-Diffusion Github Demo blip-diffusion
Star
NExT-GPT: Any-to-Any Multimodal LLM
2023-09 NeXT-GPT Github Demo next-gpt

🎛️ LVLM Agent

Title Venue/Date Note Code Demo Picture
Star
MM-REACT: Prompting ChatGPT for Multimodal Reasoning and Action
2023-03 MM-REACT Github Demo mm-react
Star
Visual Programming: Compositional visual reasoning without training
CVPR 2023 Best Paper VISPROG (Similar to ViperGPT) Github Local Demo vp
Star
HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in HuggingFace
2023-03 HuggingfaceGPT Github Demo huggingface-gpt
Star
Chameleon: Plug-and-Play Compositional Reasoning with Large Language Models
2023-04 Chameleon Github Demo chameleon
Star
IdealGPT: Iteratively Decomposing Vision and Language Reasoning via Large Language Models
2023-05 IdealGPT Github Local Demo ideal-gpt
Star
AssistGPT: A General Multi-modal Assistant that can Plan, Execute, Inspect, and Learn
2023-06 AssistGPT Github - assist-gpt

🤕 LVLM Hallunication

Title Venue/Date Note Code Demo Picture
Star
Evaluating Object Hallucination in Large Vision-Language Models
EMNLP 2023 Simple Object Hallunicattion Evaluation - POPE Github - pope
Star
Evaluation and Analysis of Hallucination in Large Vision-Language Models
2023-10 Hallunicattion Evaluation - HaELM Github - HaELM
Star
Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction Tuning
2023-06 GPT4-Assisted Visual Instruction Evaluation (GAVIE) & LRV-Instruction Github Demo gavie
Star
Woodpecker: Hallucination Correction for Multimodal Large Language Models
2023-10 First work to correct hallucinations in LVLMs Github Demo Woodpecker
Star
Can We Edit Multimodal Large Language Models?
EMNLP 2023 Knowledge Editing Benchmark Github - mm-edit
Star
Grounding Visual Illusions in Language:Do Vision-Language Models Perceive Illusions Like Humans?
EMNLP 2023 Similar to human illusion? Github - illusion
Star
Ferret: Refer and Ground Anything Anywhere at Any Granularity
ICLR 2024 Grounding Github - ferret
Star
Fine-tuning Multimodal LLMs to Follow Zero-shot Demonstrative Instructions
ICLR 2024 Details Reasoning Github - VPG