multi-modality
There are 68 repositories under multi-modality topic.
haotian-liu/LLaVA
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
jina-ai/clip-as-service
🏄 Scalable embedding, reasoning, ranking for images and sentences with CLIP
BradyFU/Awesome-Multimodal-Large-Language-Models
:sparkles::sparkles:Latest Papers and Datasets on Multimodal Large Language Models, and Their Evaluation.
lucidrains/deep-daze
Simple command line tool for text to image generation using OpenAI's CLIP and Siren (Implicit neural representation network). Technique was originally created by https://twitter.com/advadnoun
Luodian/Otter
🦦 Otter, a multi-modal model based on OpenFlamingo (open-sourced version of DeepMind's Flamingo), trained on MIMIC-IT and showcasing improved instruction-following and in-context learning ability.
InternLM/InternLM-XComposer
InternLM-XComposer2 is a groundbreaking vision-language large model (VLLM) excelling in free-form text-image composition and comprehension.
kyegomez/swarms
Orchestrate Swarms of Agents From Any Framework Like OpenAI, Langchain, and Etc for Business Operation Automation. Join our Community: https://discord.gg/DbjBMJTSWD
DLR-RM/3DObjectTracking
Algorithms and Publications on 3D Object Tracking
OpenGVLab/Multi-Modality-Arena
Chatbot Arena meets multi-modality! Multi-Modality Arena allows you to benchmark vision-language models side-by-side while providing images as inputs. Supports MiniGPT-4, LLaMA-Adapter V2, LLaVA, BLIP-2, and many more!
ziqihuangg/Collaborative-Diffusion
Collaborative Diffusion (CVPR 2023)
kyegomez/Gemini
The open source implementation of Gemini, the model that will "eclipse ChatGPT" by Google
kyegomez/Sophia
Effortless plugin and play Optimizer to cut model training costs by 50%. New optimizer that is 2x faster than Adam on LLMs.
researchmm/MM-Diffusion
[CVPR'23] MM-Diffusion: Learning Multi-Modal Diffusion Models for Joint Audio and Video Generation
ZwwWayne/mmMOT
[ICCV2019] Robust Multi-Modality Multi-Object Tracking
DerrickWang005/CRIS.pytorch
An official PyTorch implementation of the CRIS paper
dvlab-research/UVTR
Unifying Voxel-based Representation with Transformer for 3D Object Detection (NeurIPS 2022)
jackyjsy/CVPR21Chal-SLR
This repo contains the official code of our work SAM-SLR which won the CVPR 2021 Challenge on Large Scale Signer Independent Isolated Sign Language Recognition.
RLHF-V/RLHF-V
[CVPR'24] RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback
yangcaoai/CoDA_NeurIPS2023
Official code for NeurIPS2023 paper: CoDA: Collaborative Novel Box Discovery and Cross-modal Alignment for Open-vocabulary 3D Object Detection
sshh12/multi_token
Embed arbitrary modalities (images, audio, documents, etc) into large language models.
kyegomez/the-compiler
Seed, Code, Harvest: Grow Your Own App with Tree of Thoughts!
jina-ai/rungpt
An open-source cloud-native of large multi-modal models (LMMs) serving framework.
Lee-Gihun/MEDIAR
(NeurIPS 2022 CellSeg Challenge - 1st Winner) Open source code for "MEDIAR: Harmony of Data-Centric and Model-Centric for Multi-Modality Microscopy"
kyegomez/Andromeda
An all-new Language Model That Processes Ultra-Long Sequences of 100,000+ Ultra-Fast
dvlab-research/Prompt-Highlighter
[CVPR 2024] Prompt Highlighter: Interactive Control for Multi-Modal LLMs
kyegomez/MambaByte
Implementation of MambaByte in "MambaByte: Token-free Selective State Space Model" in Pytorch and Zeta
SsGood/MMGL
Multi-modal Graph learning for Disease Prediction (IEEE Trans. on Medical imaging, TMI2022)
kyegomez/MoE-Mamba
Implementation of MoE Mamba from the paper: "MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts" in Pytorch and Zeta
rentainhe/TRAR-VQA
[ICCV 2021] Official implementation of the paper "TRAR: Routing the Attention Spans in Transformers for Visual Question Answering"
kyegomez/Kosmos2.5
My implementation of Kosmos2.5 from the paper: "KOSMOS-2.5: A Multimodal Literate Model"
amazon-science/crossmodal-contrastive-learning
CrossCLR: Cross-modal Contrastive Learning For Multi-modal Video Representations, ICCV 2021
ecom-research/ComposeAE
Official code for WACV 2021 paper - Compositional Learning of Image-Text Query for Image Retrieval
rsy6318/CorrI2P
[TCSVT] CorrI2P: Deep Image-to-Point Cloud Registration via Dense CorrespondenceThe code of CorrI2P
OpenGVLab/LORIS
Long-Term Rhythmic Video Soundtracker, ICML2023
xiaoachen98/Open-LLaVA-NeXT
An open-source implementation of LLaVA-NeXT.