large-multimodal-models
There are 43 repositories under large-multimodal-models topic.
ShareGPT4Omni/ShareGPT4Video
[NeurIPS 2024] An official implementation of ShareGPT4Video: Improving Video Understanding and Generation with Better Captions
VITA-MLLM/VITA
✨✨VITA: Towards Open-Source Interactive Omni Multimodal LLM
OpenAdaptAI/OpenAdapt
Open Source Generative Process Automation (i.e. Generative RPA). AI-First Process Automation with Large ([Language (LLMs) / Action (LAMs) / Multimodal (LMMs)] / Visual Language (VLMs)) Models
LLaVA-VL/LLaVA-Plus-Codebase
LLaVA-Plus: Large Language and Vision Assistants that Plug and Learn to Use Skills
TinyLLaVA/TinyLLaVA_Factory
A Framework of Small-scale Large Multimodal Models
richard-peng-xia/awesome-multimodal-in-medical-imaging
A collection of resources on applications of multi-modal learning in medical imaging.
xiaoachen98/Open-LLaVA-NeXT
An open-source implementation for training LLaVA-NeXT.
MMMU-Benchmark/MMMU
This repo contains evaluation code for the paper "MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI"
shikiw/OPERA
[CVPR 2024 Highlight] OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allocation
thunlp/LEGENT
Open Platform for Embodied Agents
Psycoy/MixEval
The official evaluation suite and dynamic data release for MixEval.
zjysteven/lmms-finetune
A minimal codebase for finetuning large multimodal models, supporting llava-1.5/1.6, llava-interleave, llava-next-video, llava-onevision, llama-3.2-vision, qwen-vl, qwen2-vl, phi3-v etc.
ShareGPT4Omni/ShareGPT4V
[ECCV 2024] ShareGPT4V: Improving Large Multi-modal Models with Better Captions
sshh12/multi_token
Embed arbitrary modalities (images, audio, documents, etc) into large language models.
MMStar-Benchmark/MMStar
[NeurIPS 2024] This repo contains evaluation code for the paper "Are We on the Right Way for Evaluating Large Vision-Language Models"
friedrichor/Awesome-Multimodal-Papers
A curated list of awesome Multimodal studies.
shikiw/Modality-Integration-Rate
The official code of the paper "Deciphering Cross-Modal Alignment in Large Vision-Language Models with Modality Integration Rate".
AIFEG/BenchLMM
[ECCV 2024] BenchLMM: Benchmarking Cross-style Visual Capability of Large Multimodal Models
yu-rp/apiprompting
[ECCV 2024] API: Attention Prompting on Image for Large Vision-Language Models
VisualWebBench/VisualWebBench
Evaluation framework for paper "VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?"
bzluan/TextCoT
The official repo for “TextCoT: Zoom In for Enhanced Multimodal Text-Rich Image Understanding”.
YanqiDai/MMRole
A Comprehensive Framework for Developing and Evaluating Multimodal Role-Playing Agents
360CVGroup/Inner-Adaptor-Architecture
LMM which strictly superset LLM embedded
ParadoxZW/LLaVA-UHD-Better
A bug-free and improved implementation of LLaVA-UHD, based on the code from the official repo
2toinf/IVM
[NeurIPS-2024] The offical Implementation of "Instruction-Guided Visual Masking"
mbzuai-oryx/Camel-Bench
CAMEL-Bench is an Arabic benchmark for evaluating multimodal models across eight domains with 29,000 questions.
MileBench/MileBench
This repo contains evaluation code for the paper "MileBench: Benchmarking MLLMs in Long Context"
zchoi/Multi-Modal-Large-Language-Learning
Awesome multi-modal large language paper/project, collections of popular training strategies, e.g., PEFT, LoRA.
bowen-upenn/Agent_Rationality
This is the official repository of the paper "Towards Rationality in Language and Multimodal Agents: A Survey"
visual-haystacks/vhs_benchmark
🔥 Official Benchmark Toolkits for "Visual Haystacks: A Vision-Centric Needle-In-A-Haystack Benchmark"
eric-ai-lab/ProbMed
"Worse than Random? An Embarrassingly Simple Probing Evaluation of Large Multimodal Models in Medical VQA"
xyz9911/FLAME
[AAAI-25] FLAME: Learning to Navigate with Multimodal LLM in Urban Environments (arXiv:2408.11051)
Psycoy/MixEval-X
The official github repo for MixEval-X, the first any-to-any, real-world benchmark.
rohit901/VANE-Bench
Contains code and documentation for our VANE-Bench paper.
h4nwei/2AFC-LMMs
[TCSVT'24] Offical Implementation of 2AFC-LMMs
ShareGPT4Omni/ShareGPT4Omni
ShareGPT4Omni: Towards Building Omni Large Multi-modal Models with Comprehensive Multi-modal Annotations