multimodal

There are 662 repositories under multimodal topic.

jina-ai/jina
☁️ Build multimodal AI applications with cloud-native stack
Language:Python20.2k 208 1.9k2.2k
microsoft/unilm
Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
Language:Python18.6k 293 1.3k2.4k
haotian-liu/LLaVA
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
Language:Python16.8k 153 1.3k1.8k
NVIDIA/NeMo
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
Language:Python10.2k 191 2.1k2.2k
bentoml/BentoML
The most flexible way to serve AI/ML models in production - Build Model Inference Service, LLM APIs, Inference Graph/Pipelines, Compound AI systems, Multi-Modal, RAG as a Service, and more!
Language:Python6.6k 73 1k744
facebookresearch/mmf
A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)
Language:Python5.4k 114 652922
rerun-io/rerun
Visualize streams of multimodal data. Fast, easy to use, and simple to integrate. Built in Rust using egui.
Language:Rust5.3k 57 2.5k245
swyxio/ai-notes
notes for software engineers getting up to speed on new AI developments. Serves as datastore for https://latent.space writing, and product brainstorming, but has cleaned up canonical references under the /Resources folder.
Language:HTML4.7k 144 9369
SkalskiP/courses
This repository is a curated collection of links to various courses and resources about Artificial Intelligence (AI)
Language:Python4.6k 83 5413
enricoros/big-AGI
Generative AI suite powered by state-of-the-art models and providing advanced AI/AGI functions. It features AI personas, AGI functions, multi-model chats, text-to-image, voice, response streaming, code highlighting and execution, PDF import, presets for developers, much more. Deploy on-prem or in the cloud.
Language:TypeScript4.4k 50 4221k
kyegomez/tree-of-thoughts
Plug in and Play Implementation of Tree of Thoughts: Deliberate Problem Solving with Large Language Models that Elevates Model Reasoning by atleast 70%
Language:Python4.1k 51 65346
IDEA-CCNL/Fengshenbang-LM
Fengshenbang-LM(封神榜大模型)是IDEA研究院认知计算与自然语言研究中心主导的大模型开源体系，成为中文AIGC和认知智能的基础设施。
Language:Python3.9k 56 289362
jina-ai/discoart
🪩 Create Disco Diffusion artworks in one line
Language:Python3.8k 34 106246
luban-agi/Awesome-AIGC-Tutorials
Curated tutorials and resources for Large Language Models, AI Painting, and more.
3.5k 27 1226
rom1504/img2dataset
Easily turn large sets of image urls to an image dataset. Can download, resize and package 100M urls in 20h on one machine.
Language:Python3.3k 30 248315
open-mmlab/mmpretrain
OpenMMLab Pre-training Toolbox and Benchmark
Language:Python3.2k 30 7501k
OpenGVLab/InternGPT
InternGPT (iGPT) is an open source demo platform where you can easily showcase your AI models. Now it supports DragGAN, ChatGPT, ImageBind, multimodal chat like GPT-4, SAM, interactive image editing, etc. Try it at igpt.opengvlab.com (支持DragGAN、ChatGPT、ImageBind、SAM的在线Demo系统)
Language:Python3.1k 43 49225
microsoft/torchscale
Foundation Architecture for (M)LLMs
Language:Python2.9k 46 74195
NExT-GPT/NExT-GPT
Code and models for NExT-GPT: Any-to-Any Multimodal Large Language Model
Language:Python2.9k 60 86303
docarray/docarray
Represent, send, store and search multimodal data
Language:Python2.8k 44 635221
Stability-AI/stability-sdk
SDK for interacting with stability.ai APIs (e.g. stable diffusion inference)
Language:Jupyter Notebook2.4k 65 109336
OFA-Sys/OFA
Official repository of OFA (ICML 2022). Paper: OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework
Language:Python2.3k 21 359245
rom1504/clip-retrieval
Easily compute clip embeddings and build a clip retrieval system with them
Language:Jupyter Notebook2.2k 24 221198
X-PLUG/mPLUG-Owl
mPLUG-Owl & mPLUG-Owl2: Modularized Multimodal Large Language Model
Language:Python2k 27 204156
Yutong-Zhou-cv/Awesome-Text-to-Image
(ෆ`꒳´ෆ) A Survey on Text-to-Image Generation/Synthesis.
1.9k 69 7179
X-PLUG/MobileAgent
Mobile-Agent: Autonomous Multi-Modal Mobile Device Agent with Visual Perception
Language:Python1.9k 35 17156
alan-ai/alan-sdk-android
Conversational AI SDK for Android to enable text and voice conversations with actions (Java, Kotlin)
1.9k 12 1022
alan-ai/alan-sdk-flutter
Conversational AI SDK for Flutter to enable text and voice conversations with actions (iOS and Android)
Language:Ruby1.8k 10 1838
InternLM/InternLM-XComposer
InternLM-XComposer2 is a groundbreaking vision-language large model (VLLM) excelling in free-form text-image composition and comprehension.
Language:Python1.7k 33 262115
alan-ai/alan-sdk-ionic
Conversational AI SDK for Ionic to enable text and voice conversations with actions (React, Angular, Vue)
Language:TypeScript1.7k 10 1118
autodistill/autodistill
Images to inference with no labeling (use foundation models to train supervised models).
Language:Python1.6k 19 84122
modelscope/swift
ms-swift: Use PEFT or Full-parameter to finetune 200+ LLMs or 15+ MLLMs
Language:Python1.5k 11 411156
invictus717/MetaTransformer
Meta-Transformer for Unified Multimodal Learning
Language:Python1.4k 22 63111
open-mmlab/Multimodal-GPT
Multimodal-GPT
Language:Python1.4k 12 15112
kyegomez/BitNet
Implementation of "BitNet: Scaling 1-bit Transformers for Large Language Models" in pytorch
Language:Python1.4k 38 33128
Eurus-Holmes/Awesome-Multimodal-Research
A curated list of Multimodal Related Research.
Language:Python1.3k 40 1150

multimodal

jina-ai/jina

microsoft/unilm

haotian-liu/LLaVA

NVIDIA/NeMo

bentoml/BentoML

facebookresearch/mmf

rerun-io/rerun

swyxio/ai-notes

SkalskiP/courses

enricoros/big-AGI

kyegomez/tree-of-thoughts

IDEA-CCNL/Fengshenbang-LM

jina-ai/discoart

luban-agi/Awesome-AIGC-Tutorials

rom1504/img2dataset

open-mmlab/mmpretrain

OpenGVLab/InternGPT

microsoft/torchscale

NExT-GPT/NExT-GPT

docarray/docarray

Stability-AI/stability-sdk

OFA-Sys/OFA

rom1504/clip-retrieval

X-PLUG/mPLUG-Owl

Yutong-Zhou-cv/Awesome-Text-to-Image

X-PLUG/MobileAgent

alan-ai/alan-sdk-android

alan-ai/alan-sdk-flutter

InternLM/InternLM-XComposer

alan-ai/alan-sdk-ionic

autodistill/autodistill

modelscope/swift

invictus717/MetaTransformer

open-mmlab/Multimodal-GPT

kyegomez/BitNet

Eurus-Holmes/Awesome-Multimodal-Research