mllm
There are 36 repositories under mllm topic.
microsoft/unilm
Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
X-PLUG/MobileAgent
Mobile-Agent: Autonomous Multi-Modal Mobile Device Agent with Visual Perception
InternLM/InternLM-XComposer
InternLM-XComposer2 is a groundbreaking vision-language large model (VLLM) excelling in free-form text-image composition and comprehension.
atfortes/Awesome-LLM-Reasoning
Reasoning in Large Language Models: Papers and Resources, including Chain-of-Thought, Instruction-Tuning and Multimodality.
X-PLUG/mPLUG-DocOwl
mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding
CircleRadon/Osprey
[CVPR2024] The code for "Osprey: Pixel Understanding with Visual Instruction Tuning"
BAAI-DCAI/Bunny
A family of lightweight multimodal models.
BradyFU/Woodpecker
✨✨Woodpecker: Hallucination Correction for Multimodal Large Language Models. The first work to correct hallucinations in MLLMs.
FoundationVision/Groma
Grounded Multimodal Large Language Model with Localized Visual Tokenization
X-PLUG/Youku-mPLUG
Youku-mPLUG: A 10 Million Large-scale Chinese Video-Language Pre-training Dataset and Benchmarks
Coobiw/MiniGPT4Qwen
Personal Project: MPP-Qwen14B(Multimodal Pipeline Parallel-Qwen14B). Don't let the poverty limit your imagination! Train your own 14B LLaVA-like MLLM on RTX3090/4090 24GB.
gokayfem/ComfyUI_VLM_nodes
Custom ComfyUI nodes for Vision Language Models, Large Language Models, Image to Music, Text to Music, Consistent and Random Creative Prompt Generation
X-PLUG/mPLUG-2
mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video (ICML 2023)
Atomic-man007/Awesome_Multimodel_LLM
Awesome_Multimodel is a curated GitHub repository that provides a comprehensive collection of resources for Multimodal Large Language Models (MLLM). It covers datasets, tuning techniques, in-context learning, visual reasoning, foundational models, and more. Stay updated with the latest advancement.
FoundationVision/GenerateU
[CVPR2024] Generative Region-Language Pretraining for Open-Ended Object Detection
360CVGroup/SEEChat
Multimodal chatbot with computer vision capabilities integrated
TIGER-AI-Lab/Mantis
Official code for Paper "Mantis: Multi-Image Instruction Tuning"
BAAI-DCAI/DataOptim
A collection of visual instruction tuning datasets.
Ahnsun/merlin
Merlin: Empowering Multimodal LLMs with Foresight Minds
graphic-design-ai/graphist
Official Repo of Graphist
parsee-ai/parsee-datasets
Datasets, case studies and benchmarks for extracting structured information from PDFs, HTML files or images, created by the Parsee.ai team. Datasets also on Hugging Face: https://huggingface.co/parsee-ai
X-PLUG/mPLUG-HalOwl
mPLUG-HalOwl: Multimodal Hallucination Evaluation and Mitigating
KwaiVGI/Uniaa
Unified Multi-modal IAA Baseline and Benchmark
VisualWebBench/VisualWebBench
Evaluation framework for paper "VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?"
UCSC-VLAA/Sight-Beyond-Text
This repository includes the official implementation of our paper "Sight Beyond Text: Multi-Modal Training Enhances LLMs in Truthfulness and Ethics"
gyunggyung/OpenMLLM
Open Source + Multilingual MLLM + Fine-tuning + Distillation + More efficient models and learning + ?
bigai-nlco/LSTP-Chat
A Video Chat Agent with Temporal Prior
zzq2000/MIKO
MIKO: Multimodal Intention Knowledge Distillation from Large Language Models for Social-Media Commonsense Discover
CharlieDDDD/AISurveyPapers
Large Visual Language Model(LVLM), Large Language Model(LLM), Multimodal Large Language Model(MLLM), Alignment, Agent, AI System, Survey
BUAADreamer/Chinese-LLaVA-Med
中文医学多模态大模型 Large Chinese Language-and-Vision Assistant for BioMedicine
eric-ai-lab/MultipanelVQA
Code for the MultipanelVQA benchmark "Muffin or Chihuahua? Challenging Large Vision-Language Models with Multipanel VQA"
isLinXu/MLLM-Research-Learn
Conducting learning and research on MLLM based on the MME rankings.
kassy11/Awesome_Visually-Augmented_NLP
🖼️Latest Papers on Visually(Imagination)-Augmented NLP
xirui-li/attacks-on-LLMs
Awesome list for attacks on large language models.
alexander-moore/vlm
Composition of Multimodal Language Models From Scratch
kassy11/Awesome_NLP_PaperList
🤖A list of PaperList of NLP related papers on Github