multi-modality

There are 68 repositories under multi-modality topic.

  • haotian-liu/LLaVA

    [NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

    Language:Python17.2k1541.3k1.8k
  • clip-as-service

    jina-ai/clip-as-service

    🏄 Scalable embedding, reasoning, ranking for images and sentences with CLIP

    Language:Python12.2k2216042.1k
  • BradyFU/Awesome-Multimodal-Large-Language-Models

    :sparkles::sparkles:Latest Papers and Datasets on Multimodal Large Language Models, and Their Evaluation.

  • lucidrains/deep-daze

    Simple command line tool for text to image generation using OpenAI's CLIP and Siren (Implicit neural representation network). Technique was originally created by https://twitter.com/advadnoun

    Language:Python4.4k75167327
  • Otter

    Luodian/Otter

    🦦 Otter, a multi-modal model based on OpenFlamingo (open-sourced version of DeepMind's Flamingo), trained on MIMIC-IT and showcasing improved instruction-following and in-context learning ability.

    Language:Python3.5k100159239
  • InternLM/InternLM-XComposer

    InternLM-XComposer2 is a groundbreaking vision-language large model (VLLM) excelling in free-form text-image composition and comprehension.

    Language:Python1.8k35272119
  • swarms

    kyegomez/swarms

    Orchestrate Swarms of Agents From Any Framework Like OpenAI, Langchain, and Etc for Business Operation Automation. Join our Community: https://discord.gg/DbjBMJTSWD

    Language:Python75922222102
  • DLR-RM/3DObjectTracking

    Algorithms and Publications on 3D Object Tracking

    Language:C++6142263120
  • OpenGVLab/Multi-Modality-Arena

    Chatbot Arena meets multi-modality! Multi-Modality Arena allows you to benchmark vision-language models side-by-side while providing images as inputs. Supports MiniGPT-4, LLaMA-Adapter V2, LLaVA, BLIP-2, and many more!

    Language:Python39061827
  • ziqihuangg/Collaborative-Diffusion

    Collaborative Diffusion (CVPR 2023)

    Language:Python38693631
  • kyegomez/Gemini

    The open source implementation of Gemini, the model that will "eclipse ChatGPT" by Google

    Language:Python37713845
  • kyegomez/Sophia

    Effortless plugin and play Optimizer to cut model training costs by 50%. New optimizer that is 2x faster than Adam on LLMs.

    Language:Python36572526
  • researchmm/MM-Diffusion

    [CVPR'23] MM-Diffusion: Learning Multi-Modal Diffusion Models for Joint Audio and Video Generation

    Language:Python34461923
  • ZwwWayne/mmMOT

    [ICCV2019] Robust Multi-Modality Multi-Object Tracking

    Language:Python252244625
  • DerrickWang005/CRIS.pytorch

    An official PyTorch implementation of the CRIS paper

    Language:Python23012035
  • dvlab-research/UVTR

    Unifying Voxel-based Representation with Transformer for 3D Object Detection (NeurIPS 2022)

    Language:Python21963116
  • CVPR21Chal-SLR

    jackyjsy/CVPR21Chal-SLR

    This repo contains the official code of our work SAM-SLR which won the CVPR 2021 Challenge on Large Scale Signer Independent Isolated Sign Language Recognition.

    Language:Python19933250
  • RLHF-V/RLHF-V

    [CVPR'24] RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback

    Language:Python1572176
  • yangcaoai/CoDA_NeurIPS2023

    Official code for NeurIPS2023 paper: CoDA: Collaborative Novel Box Discovery and Cross-modal Alignment for Open-vocabulary 3D Object Detection

    Language:Jupyter Notebook1569613
  • sshh12/multi_token

    Embed arbitrary modalities (images, audio, documents, etc) into large language models.

    Language:Python1513176
  • kyegomez/the-compiler

    Seed, Code, Harvest: Grow Your Own App with Tree of Thoughts!

    Language:Python1424419
  • jina-ai/rungpt

    An open-source cloud-native of large multi-modal models (LMMs) serving framework.

    Language:Python141231720
  • Lee-Gihun/MEDIAR

    (NeurIPS 2022 CellSeg Challenge - 1st Winner) Open source code for "MEDIAR: Harmony of Data-Centric and Model-Centric for Multi-Modality Microscopy"

    Language:Python12651826
  • kyegomez/Andromeda

    An all-new Language Model That Processes Ultra-Long Sequences of 100,000+ Ultra-Fast

    Language:Python1238817
  • dvlab-research/Prompt-Highlighter

    [CVPR 2024] Prompt Highlighter: Interactive Control for Multi-Modal LLMs

    Language:Python104222
  • kyegomez/MambaByte

    Implementation of MambaByte in "MambaByte: Token-free Selective State Space Model" in Pytorch and Zeta

    Language:Python84614
  • SsGood/MMGL

    Multi-modal Graph learning for Disease Prediction (IEEE Trans. on Medical imaging, TMI2022)

    Language:Jupyter Notebook8231613
  • kyegomez/MoE-Mamba

    Implementation of MoE Mamba from the paper: "MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts" in Pytorch and Zeta

    Language:Python62511
  • rentainhe/TRAR-VQA

    [ICCV 2021] Official implementation of the paper "TRAR: Routing the Attention Spans in Transformers for Visual Question Answering"

    Language:Python613418
  • kyegomez/Kosmos2.5

    My implementation of Kosmos2.5 from the paper: "KOSMOS-2.5: A Multimodal Literate Model"

    Language:Python59216
  • amazon-science/crossmodal-contrastive-learning

    CrossCLR: Cross-modal Contrastive Learning For Multi-modal Video Representations, ICCV 2021

    Language:Python564311
  • ecom-research/ComposeAE

    Official code for WACV 2021 paper - Compositional Learning of Image-Text Query for Image Retrieval

    Language:Python543317
  • CorrI2P

    rsy6318/CorrI2P

    [TCSVT] CorrI2P: Deep Image-to-Point Cloud Registration via Dense CorrespondenceThe code of CorrI2P

    Language:Python532186
  • OpenGVLab/LORIS

    Long-Term Rhythmic Video Soundtracker, ICML2023

    Language:Python51561
  • xiaoachen98/Open-LLaVA-NeXT

    An open-source implementation of LLaVA-NeXT.

    Language:Python431