multimodal
There are 662 repositories under multimodal topic.
jina-ai/jina
☁️ Build multimodal AI applications with cloud-native stack
microsoft/unilm
Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
haotian-liu/LLaVA
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
NVIDIA/NeMo
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
bentoml/BentoML
The most flexible way to serve AI/ML models in production - Build Model Inference Service, LLM APIs, Inference Graph/Pipelines, Compound AI systems, Multi-Modal, RAG as a Service, and more!
facebookresearch/mmf
A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)
rerun-io/rerun
Visualize streams of multimodal data. Fast, easy to use, and simple to integrate. Built in Rust using egui.
swyxio/ai-notes
notes for software engineers getting up to speed on new AI developments. Serves as datastore for https://latent.space writing, and product brainstorming, but has cleaned up canonical references under the /Resources folder.
SkalskiP/courses
This repository is a curated collection of links to various courses and resources about Artificial Intelligence (AI)
enricoros/big-AGI
Generative AI suite powered by state-of-the-art models and providing advanced AI/AGI functions. It features AI personas, AGI functions, multi-model chats, text-to-image, voice, response streaming, code highlighting and execution, PDF import, presets for developers, much more. Deploy on-prem or in the cloud.
kyegomez/tree-of-thoughts
Plug in and Play Implementation of Tree of Thoughts: Deliberate Problem Solving with Large Language Models that Elevates Model Reasoning by atleast 70%
IDEA-CCNL/Fengshenbang-LM
Fengshenbang-LM(封神榜大模型)是IDEA研究院认知计算与自然语言研究中心主导的大模型开源体系,成为中文AIGC和认知智能的基础设施。
jina-ai/discoart
🪩 Create Disco Diffusion artworks in one line
luban-agi/Awesome-AIGC-Tutorials
Curated tutorials and resources for Large Language Models, AI Painting, and more.
rom1504/img2dataset
Easily turn large sets of image urls to an image dataset. Can download, resize and package 100M urls in 20h on one machine.
open-mmlab/mmpretrain
OpenMMLab Pre-training Toolbox and Benchmark
OpenGVLab/InternGPT
InternGPT (iGPT) is an open source demo platform where you can easily showcase your AI models. Now it supports DragGAN, ChatGPT, ImageBind, multimodal chat like GPT-4, SAM, interactive image editing, etc. Try it at igpt.opengvlab.com (支持DragGAN、ChatGPT、ImageBind、SAM的在线Demo系统)
microsoft/torchscale
Foundation Architecture for (M)LLMs
NExT-GPT/NExT-GPT
Code and models for NExT-GPT: Any-to-Any Multimodal Large Language Model
docarray/docarray
Represent, send, store and search multimodal data
Stability-AI/stability-sdk
SDK for interacting with stability.ai APIs (e.g. stable diffusion inference)
OFA-Sys/OFA
Official repository of OFA (ICML 2022). Paper: OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework
rom1504/clip-retrieval
Easily compute clip embeddings and build a clip retrieval system with them
X-PLUG/mPLUG-Owl
mPLUG-Owl & mPLUG-Owl2: Modularized Multimodal Large Language Model
Yutong-Zhou-cv/Awesome-Text-to-Image
(ෆ`꒳´ෆ) A Survey on Text-to-Image Generation/Synthesis.
X-PLUG/MobileAgent
Mobile-Agent: Autonomous Multi-Modal Mobile Device Agent with Visual Perception
alan-ai/alan-sdk-android
Conversational AI SDK for Android to enable text and voice conversations with actions (Java, Kotlin)
alan-ai/alan-sdk-flutter
Conversational AI SDK for Flutter to enable text and voice conversations with actions (iOS and Android)
InternLM/InternLM-XComposer
InternLM-XComposer2 is a groundbreaking vision-language large model (VLLM) excelling in free-form text-image composition and comprehension.
alan-ai/alan-sdk-ionic
Conversational AI SDK for Ionic to enable text and voice conversations with actions (React, Angular, Vue)
autodistill/autodistill
Images to inference with no labeling (use foundation models to train supervised models).
modelscope/swift
ms-swift: Use PEFT or Full-parameter to finetune 200+ LLMs or 15+ MLLMs
invictus717/MetaTransformer
Meta-Transformer for Unified Multimodal Learning
open-mmlab/Multimodal-GPT
Multimodal-GPT
kyegomez/BitNet
Implementation of "BitNet: Scaling 1-bit Transformers for Large Language Models" in pytorch
Eurus-Holmes/Awesome-Multimodal-Research
A curated list of Multimodal Related Research.