mllm

There are 36 repositories under mllm topic.

  • microsoft/unilm

    Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities

    Language:Python18.7k2931.3k2.4k
  • X-PLUG/MobileAgent

    Mobile-Agent: Autonomous Multi-Modal Mobile Device Agent with Visual Perception

    Language:Python1.9k3517159
  • InternLM/InternLM-XComposer

    InternLM-XComposer2 is a groundbreaking vision-language large model (VLLM) excelling in free-form text-image composition and comprehension.

    Language:Python1.8k34266118
  • atfortes/Awesome-LLM-Reasoning

    Reasoning in Large Language Models: Papers and Resources, including Chain-of-Thought, Instruction-Tuning and Multimodality.

  • X-PLUG/mPLUG-DocOwl

    mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding

    Language:Python1k277359
  • CircleRadon/Osprey

    [CVPR2024] The code for "Osprey: Pixel Understanding with Visual Instruction Tuning"

    Language:Python700133140
  • BAAI-DCAI/Bunny

    A family of lightweight multimodal models.

    Language:Python698197555
  • BradyFU/Woodpecker

    ✨✨Woodpecker: Hallucination Correction for Multimodal Large Language Models. The first work to correct hallucinations in MLLMs.

    Language:Python561151128
  • FoundationVision/Groma

    Grounded Multimodal Large Language Model with Localized Visual Tokenization

    Language:Python42336654
  • X-PLUG/Youku-mPLUG

    Youku-mPLUG: A 10 Million Large-scale Chinese Video-Language Pre-training Dataset and Benchmarks

    Language:Python26252811
  • Coobiw/MiniGPT4Qwen

    Personal Project: MPP-Qwen14B(Multimodal Pipeline Parallel-Qwen14B). Don't let the poverty limit your imagination! Train your own 14B LLaVA-like MLLM on RTX3090/4090 24GB.

    Language:Jupyter Notebook25941813
  • gokayfem/ComfyUI_VLM_nodes

    Custom ComfyUI nodes for Vision Language Models, Large Language Models, Image to Music, Text to Music, Consistent and Random Creative Prompt Generation

    Language:Python24178414
  • X-PLUG/mPLUG-2

    mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video (ICML 2023)

    Language:Python21162017
  • Atomic-man007/Awesome_Multimodel_LLM

    Awesome_Multimodel is a curated GitHub repository that provides a comprehensive collection of resources for Multimodal Large Language Models (MLLM). It covers datasets, tuning techniques, in-context learning, visual reasoning, foundational models, and more. Stay updated with the latest advancement.

  • FoundationVision/GenerateU

    [CVPR2024] Generative Region-Language Pretraining for Open-Ended Object Detection

    Language:Python1005106
  • 360CVGroup/SEEChat

    Multimodal chatbot with computer vision capabilities integrated

    Language:Python94368
  • TIGER-AI-Lab/Mantis

    Official code for Paper "Mantis: Multi-Image Instruction Tuning"

    Language:Python87634
  • BAAI-DCAI/DataOptim

    A collection of visual instruction tuning datasets.

    Language:Python68403
  • Ahnsun/merlin

    Merlin: Empowering Multimodal LLMs with Foresight Minds

    Language:Python67620
  • graphic-design-ai/graphist

    Official Repo of Graphist

  • parsee-ai/parsee-datasets

    Datasets, case studies and benchmarks for extracting structured information from PDFs, HTML files or images, created by the Parsee.ai team. Datasets also on Hugging Face: https://huggingface.co/parsee-ai

    Language:Jupyter Notebook61201
  • X-PLUG/mPLUG-HalOwl

    mPLUG-HalOwl: Multimodal Hallucination Evaluation and Mitigating

    Language:Python59011
  • KwaiVGI/Uniaa

    Unified Multi-modal IAA Baseline and Benchmark

  • VisualWebBench/VisualWebBench

    Evaluation framework for paper "VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?"

    Language:Python34300
  • UCSC-VLAA/Sight-Beyond-Text

    This repository includes the official implementation of our paper "Sight Beyond Text: Multi-Modal Training Enhances LLMs in Truthfulness and Ethics"

    Language:Python19211
  • gyunggyung/OpenMLLM

    Open Source + Multilingual MLLM + Fine-tuning + Distillation + More efficient models and learning + ?

    Language:C18405
  • bigai-nlco/LSTP-Chat

    A Video Chat Agent with Temporal Prior

    Language:Python17221
  • zzq2000/MIKO

    MIKO: Multimodal Intention Knowledge Distillation from Large Language Models for Social-Media Commonsense Discover

    Language:Python16100
  • CharlieDDDD/AISurveyPapers

    Large Visual Language Model(LVLM), Large Language Model(LLM), Multimodal Large Language Model(MLLM), Alignment, Agent, AI System, Survey

  • BUAADreamer/Chinese-LLaVA-Med

    中文医学多模态大模型 Large Chinese Language-and-Vision Assistant for BioMedicine

    Language:Python90
  • eric-ai-lab/MultipanelVQA

    Code for the MultipanelVQA benchmark "Muffin or Chihuahua? Challenging Large Vision-Language Models with Multipanel VQA"

    Language:Jupyter Notebook6
  • isLinXu/MLLM-Research-Learn

    Conducting learning and research on MLLM based on the MME rankings.

  • kassy11/Awesome_Visually-Augmented_NLP

    🖼️Latest Papers on Visually(Imagination)-Augmented NLP

  • xirui-li/attacks-on-LLMs

    Awesome list for attacks on large language models.

  • alexander-moore/vlm

    Composition of Multimodal Language Models From Scratch

    Language:Jupyter Notebook
  • kassy11/Awesome_NLP_PaperList

    🤖A list of PaperList of NLP related papers on Github