large-multimodal-models

There are 43 repositories under large-multimodal-models topic.

ShareGPT4Omni/ShareGPT4Video
[NeurIPS 2024] An official implementation of ShareGPT4Video: Improving Video Understanding and Generation with Better Captions
Language:Python1.3k 32 3944
VITA-MLLM/VITA
✨✨VITA: Towards Open-Source Interactive Omni Multimodal LLM
Language:Python1.1k 40 5764
OpenAdaptAI/OpenAdapt
Open Source Generative Process Automation (i.e. Generative RPA). AI-First Process Automation with Large ([Language (LLMs) / Action (LAMs) / Multimodal (LMMs)] / Visual Language (VLMs)) Models
Language:Python1k 9 503149
LLaVA-VL/LLaVA-Plus-Codebase
LLaVA-Plus: Large Language and Vision Assistants that Plug and Learn to Use Skills
Language:Python718 12 2454
TinyLLaVA/TinyLLaVA_Factory
A Framework of Small-scale Large Multimodal Models
Language:Python697 11 14574
richard-peng-xia/awesome-multimodal-in-medical-imaging
A collection of resources on applications of multi-modal learning in medical imaging.
608 17 361
xiaoachen98/Open-LLaVA-NeXT
An open-source implementation for training LLaVA-NeXT.
Language:Python403 13 2420
MMMU-Benchmark/MMMU
This repo contains evaluation code for the paper "MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI"
Language:Python372 4 3730
shikiw/OPERA
[CVPR 2024 Highlight] OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allocation
Language:Python297 3 4827
thunlp/LEGENT
Open Platform for Embodied Agents
Language:Python278 11 1116
Psycoy/MixEval
The official evaluation suite and dynamic data release for MixEval.
Language:Python229 1 3437
zjysteven/lmms-finetune
A minimal codebase for finetuning large multimodal models, supporting llava-1.5/1.6, llava-interleave, llava-next-video, llava-onevision, llama-3.2-vision, qwen-vl, qwen2-vl, phi3-v etc.
Language:Python211 5 4924
ShareGPT4Omni/ShareGPT4V
[ECCV 2024] ShareGPT4V: Improving Large Multi-modal Models with Better Captions
Language:Python179 3 174
sshh12/multi_token
Embed arbitrary modalities (images, audio, documents, etc) into large language models.
Language:Python175 3 2512
MMStar-Benchmark/MMStar
[NeurIPS 2024] This repo contains evaluation code for the paper "Are We on the Right Way for Evaluating Large Vision-Language Models"
Language:Python156 1 105
friedrichor/Awesome-Multimodal-Papers
A curated list of awesome Multimodal studies.
Language:HTML112 4 19
shikiw/Modality-Integration-Rate
The official code of the paper "Deciphering Cross-Modal Alignment in Large Vision-Language Models with Modality Integration Rate".
Language:Python91 2 13
AIFEG/BenchLMM
[ECCV 2024] BenchLMM: Benchmarking Cross-style Visual Capability of Large Multimodal Models
Language:Python83 0 16
yu-rp/apiprompting
[ECCV 2024] API: Attention Prompting on Image for Large Vision-Language Models
Language:Python54 2 23
VisualWebBench/VisualWebBench
Evaluation framework for paper "VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?"
Language:Python47 3 71
bzluan/TextCoT
The official repo for “TextCoT: Zoom In for Enhanced Multimodal Text-Rich Image Understanding”.
Language:Python37 3 54
YanqiDai/MMRole
A Comprehensive Framework for Developing and Evaluating Multimodal Role-Playing Agents
Language:Python33 1 12
360CVGroup/Inner-Adaptor-Architecture
LMM which strictly superset LLM embedded
Language:Python32 0 34
ParadoxZW/LLaVA-UHD-Better
A bug-free and improved implementation of LLaVA-UHD, based on the code from the official repo
Language:Python32 4 63
2toinf/IVM
[NeurIPS-2024] The offical Implementation of "Instruction-Guided Visual Masking"
Language:Jupyter Notebook30 2 12
mbzuai-oryx/Camel-Bench
CAMEL-Bench is an Arabic benchmark for evaluating multimodal models across eight domains with 29,000 questions.
Language:Python301
MileBench/MileBench
This repo contains evaluation code for the paper "MileBench: Benchmarking MLLMs in Long Context"
Language:Python28 4 41
zchoi/Multi-Modal-Large-Language-Learning
Awesome multi-modal large language paper/project, collections of popular training strategies, e.g., PEFT, LoRA.
25 3 00
bowen-upenn/Agent_Rationality
This is the official repository of the paper "Towards Rationality in Language and Multimodal Agents: A Survey"
24 1 00
visual-haystacks/vhs_benchmark
🔥 Official Benchmark Toolkits for "Visual Haystacks: A Vision-Centric Needle-In-A-Haystack Benchmark"
Language:Python21 1 01
eric-ai-lab/ProbMed
"Worse than Random? An Embarrassingly Simple Probing Evaluation of Large Multimodal Models in Medical VQA"
Language:Python15 1 11
xyz9911/FLAME
[AAAI-25] FLAME: Learning to Navigate with Multimodal LLM in Urban Environments (arXiv:2408.11051)
14 1 02
Psycoy/MixEval-X
The official github repo for MixEval-X, the first any-to-any, real-world benchmark.
Language:Python100
rohit901/VANE-Bench
Contains code and documentation for our VANE-Bench paper.
Language:Python10 1 01
h4nwei/2AFC-LMMs
[TCSVT'24] Offical Implementation of 2AFC-LMMs
Language:Python9 1 00
ShareGPT4Omni/ShareGPT4Omni
ShareGPT4Omni: Towards Building Omni Large Multi-modal Models with Comprehensive Multi-modal Annotations
8 2 00

large-multimodal-models

ShareGPT4Omni/ShareGPT4Video

VITA-MLLM/VITA

OpenAdaptAI/OpenAdapt

LLaVA-VL/LLaVA-Plus-Codebase

TinyLLaVA/TinyLLaVA_Factory

richard-peng-xia/awesome-multimodal-in-medical-imaging

xiaoachen98/Open-LLaVA-NeXT

MMMU-Benchmark/MMMU

shikiw/OPERA

thunlp/LEGENT

Psycoy/MixEval

zjysteven/lmms-finetune

ShareGPT4Omni/ShareGPT4V

sshh12/multi_token

MMStar-Benchmark/MMStar

friedrichor/Awesome-Multimodal-Papers

shikiw/Modality-Integration-Rate

AIFEG/BenchLMM

yu-rp/apiprompting

VisualWebBench/VisualWebBench

bzluan/TextCoT

YanqiDai/MMRole

360CVGroup/Inner-Adaptor-Architecture

ParadoxZW/LLaVA-UHD-Better

2toinf/IVM

mbzuai-oryx/Camel-Bench

MileBench/MileBench

zchoi/Multi-Modal-Large-Language-Learning

bowen-upenn/Agent_Rationality

visual-haystacks/vhs_benchmark

eric-ai-lab/ProbMed

xyz9911/FLAME

Psycoy/MixEval-X

rohit901/VANE-Bench

h4nwei/2AFC-LMMs

ShareGPT4Omni/ShareGPT4Omni