multimodal

There are 662 repositories under multimodal topic.

  • jina

    jina-ai/jina

    ☁️ Build multimodal AI applications with cloud-native stack

    Language:Python20.2k2081.9k2.2k
  • microsoft/unilm

    Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities

    Language:Python18.6k2931.3k2.4k
  • haotian-liu/LLaVA

    [NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

    Language:Python16.8k1531.3k1.8k
  • NVIDIA/NeMo

    A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)

    Language:Python10.2k1912.1k2.2k
  • BentoML

    bentoml/BentoML

    The most flexible way to serve AI/ML models in production - Build Model Inference Service, LLM APIs, Inference Graph/Pipelines, Compound AI systems, Multi-Modal, RAG as a Service, and more!

    Language:Python6.6k731k744
  • facebookresearch/mmf

    A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)

    Language:Python5.4k114652922
  • rerun

    rerun-io/rerun

    Visualize streams of multimodal data. Fast, easy to use, and simple to integrate. Built in Rust using egui.

    Language:Rust5.3k572.5k245
  • swyxio/ai-notes

    notes for software engineers getting up to speed on new AI developments. Serves as datastore for https://latent.space writing, and product brainstorming, but has cleaned up canonical references under the /Resources folder.

    Language:HTML4.7k1449369
  • SkalskiP/courses

    This repository is a curated collection of links to various courses and resources about Artificial Intelligence (AI)

    Language:Python4.6k835413
  • big-AGI

    enricoros/big-AGI

    Generative AI suite powered by state-of-the-art models and providing advanced AI/AGI functions. It features AI personas, AGI functions, multi-model chats, text-to-image, voice, response streaming, code highlighting and execution, PDF import, presets for developers, much more. Deploy on-prem or in the cloud.

    Language:TypeScript4.4k504221k
  • kyegomez/tree-of-thoughts

    Plug in and Play Implementation of Tree of Thoughts: Deliberate Problem Solving with Large Language Models that Elevates Model Reasoning by atleast 70%

    Language:Python4.1k5165346
  • Fengshenbang-LM

    IDEA-CCNL/Fengshenbang-LM

    Fengshenbang-LM(封神榜大模型)是IDEA研究院认知计算与自然语言研究中心主导的大模型开源体系,成为中文AIGC和认知智能的基础设施。

    Language:Python3.9k56289362
  • discoart

    jina-ai/discoart

    🪩 Create Disco Diffusion artworks in one line

    Language:Python3.8k34106246
  • luban-agi/Awesome-AIGC-Tutorials

    Curated tutorials and resources for Large Language Models, AI Painting, and more.

  • rom1504/img2dataset

    Easily turn large sets of image urls to an image dataset. Can download, resize and package 100M urls in 20h on one machine.

    Language:Python3.3k30248315
  • open-mmlab/mmpretrain

    OpenMMLab Pre-training Toolbox and Benchmark

    Language:Python3.2k307501k
  • OpenGVLab/InternGPT

    InternGPT (iGPT) is an open source demo platform where you can easily showcase your AI models. Now it supports DragGAN, ChatGPT, ImageBind, multimodal chat like GPT-4, SAM, interactive image editing, etc. Try it at igpt.opengvlab.com (支持DragGAN、ChatGPT、ImageBind、SAM的在线Demo系统)

    Language:Python3.1k4349225
  • microsoft/torchscale

    Foundation Architecture for (M)LLMs

    Language:Python2.9k4674195
  • NExT-GPT/NExT-GPT

    Code and models for NExT-GPT: Any-to-Any Multimodal Large Language Model

    Language:Python2.9k6086303
  • docarray

    docarray/docarray

    Represent, send, store and search multimodal data

    Language:Python2.8k44635221
  • Stability-AI/stability-sdk

    SDK for interacting with stability.ai APIs (e.g. stable diffusion inference)

    Language:Jupyter Notebook2.4k65109336
  • OFA-Sys/OFA

    Official repository of OFA (ICML 2022). Paper: OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework

    Language:Python2.3k21359245
  • rom1504/clip-retrieval

    Easily compute clip embeddings and build a clip retrieval system with them

    Language:Jupyter Notebook2.2k24221198
  • X-PLUG/mPLUG-Owl

    mPLUG-Owl & mPLUG-Owl2: Modularized Multimodal Large Language Model

    Language:Python2k27204156
  • Awesome-Text-to-Image

    Yutong-Zhou-cv/Awesome-Text-to-Image

    (ෆ`꒳´ෆ) A Survey on Text-to-Image Generation/Synthesis.

  • X-PLUG/MobileAgent

    Mobile-Agent: Autonomous Multi-Modal Mobile Device Agent with Visual Perception

    Language:Python1.9k3517156
  • alan-sdk-android

    alan-ai/alan-sdk-android

    Conversational AI SDK for Android to enable text and voice conversations with actions (Java, Kotlin)

  • alan-ai/alan-sdk-flutter

    Conversational AI SDK for Flutter to enable text and voice conversations with actions (iOS and Android)

    Language:Ruby1.8k101838
  • InternLM/InternLM-XComposer

    InternLM-XComposer2 is a groundbreaking vision-language large model (VLLM) excelling in free-form text-image composition and comprehension.

    Language:Python1.7k33262115
  • alan-ai/alan-sdk-ionic

    Conversational AI SDK for Ionic to enable text and voice conversations with actions (React, Angular, Vue)

    Language:TypeScript1.7k101118
  • autodistill

    autodistill/autodistill

    Images to inference with no labeling (use foundation models to train supervised models).

    Language:Python1.6k1984122
  • modelscope/swift

    ms-swift: Use PEFT or Full-parameter to finetune 200+ LLMs or 15+ MLLMs

    Language:Python1.5k11411156
  • invictus717/MetaTransformer

    Meta-Transformer for Unified Multimodal Learning

    Language:Python1.4k2263111
  • open-mmlab/Multimodal-GPT

    Multimodal-GPT

    Language:Python1.4k1215112
  • kyegomez/BitNet

    Implementation of "BitNet: Scaling 1-bit Transformers for Large Language Models" in pytorch

    Language:Python1.4k3833128
  • Awesome-Multimodal-Research

    Eurus-Holmes/Awesome-Multimodal-Research

    A curated list of Multimodal Related Research.

    Language:Python1.3k401150