multi-modality

There are 86 repositories under multi-modality topic.

haotian-liu/LLaVA
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
Language:Python23.6k 160 1.6k2.6k
BradyFU/Awesome-Multimodal-Large-Language-Models
:sparkles::sparkles:Latest Advances on Multimodal Large Language Models
16.3k 282 1481.1k
jina-ai/clip-as-service
🏄 Scalable embedding, reasoning, ranking for images and sentences with CLIP
Language:Python12.7k 222 6152.1k
kyegomez/swarms
The Enterprise-Grade Production-Ready Multi-Agent Orchestration Framework. Website: https://swarms.ai
Language:Python5.2k 56 394637
lucidrains/deep-daze
Simple command line tool for text to image generation using OpenAI's CLIP and Siren (Implicit neural representation network). Technique was originally created by https://twitter.com/advadnoun
Language:Python4.3k 74 166316
EvolvingLMMs-Lab/Otter
🦦 Otter, a multi-modal model based on OpenFlamingo (open-sourced version of DeepMind's Flamingo), trained on MIMIC-IT and showcasing improved instruction-following and in-context learning ability.
Language:Python3.3k 80 165208
InternLM/InternLM-XComposer
InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions
Language:Python2.9k 43 436177
DLR-RM/3DObjectTracking
Algorithms and Publications on 3D Object Tracking
Language:C++919 25 70163
OpenBMB/VisRAG
Parsing-free RAG supported by VLMs
Language:Python786 12 5358
OpenGVLab/Multi-Modality-Arena
Chatbot Arena meets multi-modality! Multi-Modality Arena allows you to benchmark vision-language models side-by-side while providing images as inputs. Supports MiniGPT-4, LLaMA-Adapter V2, LLaVA, BLIP-2, and many more!
Language:Python538 7 2839
LSXI7/MINIMA
[CVPR 2025] MINIMA: Modality Invariant Image Matching
Language:Python487 9 4634
kyegomez/Gemini
The open source implementation of Gemini, the model that will "eclipse ChatGPT" by Google
Language:Python459 11 861
researchmm/MM-Diffusion
[CVPR'23] MM-Diffusion: Learning Multi-Modal Diffusion Models for Joint Audio and Video Generation
Language:Python442 5 2424
ziqihuangg/Collaborative-Diffusion
[CVPR 2023] Collaborative Diffusion
Language:Python430 8 4235
xiaoachen98/Open-LLaVA-NeXT
An open-source implementation for training LLaVA-NeXT.
Language:Python419 11 3122
kyegomez/Sophia
Effortless plugin and play Optimizer to cut model training costs by 50%. New optimizer that is 2x faster than Adam on LLMs.
Language:Python382 8 2526
dvlab-research/VisionZip
Official repository for VisionZip (CVPR 2025)
Language:Python347 5 1715
RLHF-V/RLHF-V
[CVPR'24] RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback
Language:Python292 2 298
DerrickWang005/CRIS.pytorch
An official PyTorch implementation of the CRIS paper
Language:Python278 1 2138
ZwwWayne/mmMOT
[ICCV2019] Robust Multi-Modality Multi-Object Tracking
Language:Python256 23 4623
dvlab-research/UVTR
Unifying Voxel-based Representation with Transformer for 3D Object Detection (NeurIPS 2022)
Language:Python242 6 3417
jackyjsy/CVPR21Chal-SLR
This repo contains the official code of our work SAM-SLR which won the CVPR 2021 Challenge on Large Scale Signer Independent Isolated Sign Language Recognition.
Language:Python217 3 3352
yangcaoai/CoDA_NeurIPS2023
Official code for NeurIPS2023 paper: CoDA: Collaborative Novel Box Discovery and Cross-modal Alignment for Open-vocabulary 3D Object Detection
Language:Jupyter Notebook210 11 1816
sshh12/multi_token
Embed arbitrary modalities (images, audio, documents, etc) into large language models.
Language:Python187 3 2616
ChenHongruixuan/BRIGHT
[IEEE GRSS DFC 2025 Track II] BRIGHT: A globally distributed multimodal VHR dataset for all-weather disaster response
Language:Python169 4 622
jina-ai/rungpt
An open-source cloud-native of large multi-modal models (LMMs) serving framework.
Language:Python167 21 1722
Lee-Gihun/MEDIAR
(NeurIPS 2022 CellSeg Challenge - 1st Winner) Open source code for "MEDIAR: Harmony of Data-Centric and Model-Centric for Multi-Modality Microscopy"
Language:Python153 4 2135
dvlab-research/Prompt-Highlighter
[CVPR 2024] Prompt Highlighter: Interactive Control for Multi-Modal LLMs
Language:Python152 2 54
kyegomez/Andromeda
An all-new Language Model That Processes Ultra-Long Sequences of 100,000+ Ultra-Fast
Language:Python152 10 823
kyegomez/the-compiler
Seed, Code, Harvest: Grow Your Own App with Tree of Thoughts!
Language:Python144 4 416
skit-ai/SpeechLLM
This repository contains the training, inference, evaluation code for SpeechLLM models and details about the model releases on huggingface.
Language:Python121 5 310
kyegomez/MambaByte
Implementation of MambaByte in "MambaByte: Token-free Selective State Space Model" in Pytorch and Zeta
Language:Python115 3 27
kyegomez/MoE-Mamba
Implementation of MoE Mamba from the paper: "MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts" in Pytorch and Zeta
Language:Python110 5 47
SsGood/MMGL
Multi-modal Graph learning for Disease Prediction (IEEE Trans. on Medical imaging, TMI2022)
Language:Jupyter Notebook105 2 1917
rsy6318/CorrI2P
[TCSVT] CorrI2P: Deep Image-to-Point Cloud Registration via Dense CorrespondenceThe code of CorrI2P
Language:Python91 3 239
kyegomez/Kosmos2.5
My implementation of Kosmos2.5 from the paper: "KOSMOS-2.5: A Multimodal Literate Model"
Language:Python73 2 26

multi-modality

haotian-liu/LLaVA

BradyFU/Awesome-Multimodal-Large-Language-Models

jina-ai/clip-as-service

kyegomez/swarms

lucidrains/deep-daze

EvolvingLMMs-Lab/Otter

InternLM/InternLM-XComposer

DLR-RM/3DObjectTracking

OpenBMB/VisRAG

OpenGVLab/Multi-Modality-Arena

LSXI7/MINIMA

kyegomez/Gemini

researchmm/MM-Diffusion

ziqihuangg/Collaborative-Diffusion

xiaoachen98/Open-LLaVA-NeXT

kyegomez/Sophia

dvlab-research/VisionZip

RLHF-V/RLHF-V

DerrickWang005/CRIS.pytorch

ZwwWayne/mmMOT

dvlab-research/UVTR

jackyjsy/CVPR21Chal-SLR

yangcaoai/CoDA_NeurIPS2023

sshh12/multi_token

ChenHongruixuan/BRIGHT

jina-ai/rungpt

Lee-Gihun/MEDIAR

dvlab-research/Prompt-Highlighter

kyegomez/Andromeda

kyegomez/the-compiler

skit-ai/SpeechLLM

kyegomez/MambaByte

kyegomez/MoE-Mamba

SsGood/MMGL

rsy6318/CorrI2P

kyegomez/Kosmos2.5