multi-modal

There are 408 repositories under multi-modal topic.

OpenBMB/MiniCPM-V
MiniCPM-V 4.5: A GPT-4o Level MLLM for Single Image, Multi Image and High-FPS Video Understanding on Your Phone
Language:Python22.2k 153 9061.7k
agentscope-ai/agentscope
AgentScope: Agent-Oriented Programming for Building LLM Applications
Language:Python13.7k 74 3281.1k
OpenGVLab/InternVL
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
Language:Python9.4k 65 1.1k732
activeloopai/deeplake
Database for AI. Store Vectors, Images, Texts, Videos, etc. Use with LLMs/LangChain. Store, query, version, & visualize any AI data. Stream data in real-time to PyTorch/TensorFlow. https://activeloop.ai
Language:Python8.9k 97 475692
TEN-framework/ten-framework
Open-source framework for conversational voice AI agents
Language:C8.5k 60 438994
modelscope/modelscope
ModelScope: bring the notion of Model-as-a-Service to life.
Language:Python8.4k 78 818878
zai-org/CogVLM
a state-of-the-art-level open visual language model | 多模态预训练模型
Language:Python6.7k 68 442445
lucidrains/DALLE-pytorch
Implementation / replication of DALL-E, OpenAI's Text to Image Transformer, in Pytorch
Language:Python5.6k 92 277646
OFA-Sys/Chinese-CLIP
Chinese version of CLIP which achieves Chinese cross-modal retrieval and representation generation.
Language:Jupyter Notebook5.6k 36 372527
modelscope/data-juicer
Data processing for and with foundation models! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷
Language:Python5.3k 20 293275
valhalla/valhalla
Open Source Routing Engine for OpenStreetMap
Language:C++5.2k 107 2.6k782
marqo-ai/marqo
Unified embedding generation and search engine. Also available on cloud - cloud.marqo.ai
Language:Python5k 37 246218
VectorSpaceLab/OmniGen
OmniGen: Unified Image Generation. https://arxiv.org/pdf/2409.11340
Language:Jupyter Notebook4.3k 87 178365
zjunlp/DeepKE
[EMNLP 2022] An Open Toolkit for Knowledge Graph Extraction and Construction
Language:Python4.2k 43 619731
zai-org/VisualGLM-6B
Chinese and English multimodal conversational language model | 多模态中英双语对话语言模型
Language:Python4.2k 38 358427
SciSharp/LLamaSharp
A C#/.NET library to run LLM (🦙LLaMA/LLaVA) on your local device efficiently.
Language:C#3.4k 64 446475
PKU-YuanGroup/Video-LLaVA
【EMNLP 2024🔥】Video-LLaVA: Learning United Visual Representation by Alignment Before Projection
Language:Python3.4k 29 209245
open-compass/VLMEvalKit
Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks
Language:Python3.3k 9 508536
docarray/docarray
Represent, send, store and search multimodal data
Language:Python3.1k 46 641231
dvlab-research/LISA
Project Page for "LISA: Reasoning Segmentation via Large Language Model"
Language:Python2.5k 10 200185
zai-org/CogVLM2
GPT4V-level open-source multi-modal model based on Llama3-8B
Language:Python2.4k 28 187161
PKU-YuanGroup/MoE-LLaVA
【TMM 2025🔥】 Mixture-of-Experts for Large Vision-Language Models
Language:Python2.3k 22 98140
tangxyw/RecSysPapers
推荐/广告/搜索领域工业界经典以及最前沿论文集合。A collection of industry classics and cutting-edge papers in the field of recommendation/advertising/search.
Language:Python2k 67 1256
Kav-K/GPTDiscord
A robust, all-in-one GPT interface for Discord. ChatGPT-style conversations, image generation, AI-moderation, custom indexes/knowledgebase, youtube summarizer, and more!
Language:Python1.9k 28 238295
OpenMotionLab/MotionGPT
[NeurIPS 2023] MotionGPT: Human Motion as a Foreign Language, a unified motion-language generation model using LLMs
Language:Python1.8k 42 107130
IntelLabs/fastRAG
Efficient Retrieval Augmentation and Generation Framework
Language:Python1.7k 16 36164
bytedance/SALMONN
SALMONN family: A suite of advanced multi-modal LLMs
1.4k 26 87111
DirtyHarryLYL/Transformer-in-Vision
Recent Transformer-based CV and related works.
1.3k 84 5142
vercel/modelfusion
The TypeScript library for building AI applications.
Language:TypeScript1.3k 12 6691
MedMNIST/MedMNIST
[pip install medmnist] 18x Standardized Datasets for 2D and 3D Biomedical Image Classification
Language:Python1.3k 15 65191
lucidrains/transfusion-pytorch
Pytorch implementation of Transfusion, "Predict the Next Token and Diffuse Images with One Multi-Modal Model", from MetaAI
Language:Python1.3k 31 4058
Tebmer/Awesome-Knowledge-Distillation-of-LLMs
This repository collects papers for "A Survey on Knowledge Distillation of Large Language Models". We break down KD into Knowledge Elicitation and Distillation Algorithms, and explore the Skill & Vertical Distillation of LLMs.
1.2k 17 670
OpenBMB/VisRAG
Parsing-free RAG supported by VLMs
Language:Python850 11 6668
PKU-YuanGroup/LanguageBind
【ICLR 2024🔥】 Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment
Language:Python844 13 6755
AnswerDotAI/byaldi
Use late-interaction multi-modal models such as ColPali in just a few lines of code.
Language:Python829 20 5992
microsoft/farmvibes-ai
FarmVibes.AI: Multi-Modal GeoSpatial ML Models for Agriculture and Sustainability
Language:Jupyter Notebook808 37 142156

multi-modal

OpenBMB/MiniCPM-V

agentscope-ai/agentscope

OpenGVLab/InternVL

activeloopai/deeplake

TEN-framework/ten-framework

modelscope/modelscope

zai-org/CogVLM

lucidrains/DALLE-pytorch

OFA-Sys/Chinese-CLIP

modelscope/data-juicer

valhalla/valhalla

marqo-ai/marqo

VectorSpaceLab/OmniGen

zjunlp/DeepKE

zai-org/VisualGLM-6B

SciSharp/LLamaSharp

PKU-YuanGroup/Video-LLaVA

open-compass/VLMEvalKit

docarray/docarray

dvlab-research/LISA

zai-org/CogVLM2

PKU-YuanGroup/MoE-LLaVA

tangxyw/RecSysPapers

Kav-K/GPTDiscord

OpenMotionLab/MotionGPT

IntelLabs/fastRAG

bytedance/SALMONN

DirtyHarryLYL/Transformer-in-Vision

vercel/modelfusion

MedMNIST/MedMNIST

lucidrains/transfusion-pytorch

Tebmer/Awesome-Knowledge-Distillation-of-LLMs

OpenBMB/VisRAG

PKU-YuanGroup/LanguageBind

AnswerDotAI/byaldi

microsoft/farmvibes-ai