multimodal-learning
There are 239 repositories under multimodal-learning topic.
pliang279/awesome-multimodal-ml
Reading list for research topics in multimodal machine learning
mlfoundations/open_flamingo
An open-source framework for training large multimodal models.
Eurus-Holmes/Awesome-Multimodal-Research
A curated list of Multimodal Related Research.
DmitryRyumin/ICCV-2023-Papers
ICCV 2023 Papers: Discover cutting-edge research from ICCV 2023, the leading computer vision conference. Stay updated on the latest in computer vision and deep learning, with code included. ⭐ support visual intelligence development!
AILab-CVC/UniRepLKNet
[CVPR'24] UniRepLKNet: A Universal Perception Large-Kernel ConvNet for Audio, Video, Point Cloud, Time-Series and Image Recognition
PreferredAI/cornac
A Comparative Framework for Multimodal Recommender Systems
ArrowLuo/CLIP4Clip
An official implementation for "CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval"
HuaizhengZhang/Awsome-Deep-Learning-for-Video-Analysis
Papers, code and datasets about deep learning and multi-modal learning for video analysis
declare-lab/multimodal-deep-learning
This repository contains various models targetting multimodal representation learning, multimodal fusion for downstream tasks such as multimodal sentiment analysis.
henghuiding/ReLA
[CVPR2023 Highlight] GRES: Generalized Referring Expression Segmentation
georgian-io/Multimodal-Toolkit
Multimodal model for text and tabular data with HuggingFace transformers as building block for text data
njustkmg/OMML
Multi-Modal learning toolkit based on PaddlePaddle and PyTorch, supporting multiple applications such as multi-modal classification, cross-modal retrieval and image caption.
subho406/OmniNet
Official Pytorch implementation of "OmniNet: A unified architecture for multi-modal multi-task learning" | Authors: Subhojeet Pramanik, Priyanka Agrawal, Aman Hussain
henghuiding/MeViS
[ICCV 2023] MeViS: A Large-scale Benchmark for Video Segmentation with Motion Expressions
microsoft/XPretrain
Multi-modality pre-training
pliang279/MultiBench
[NeurIPS 2021] Multiscale Benchmarks for Multimodal Representation Learning
pykale/pykale
Knowledge-Aware machine LEarning (KALE): accessible machine learning from multiple sources for interdisciplinary research, part of the 🔥PyTorch ecosystem. ⭐ Star to support our work!
sangminwoo/awesome-vision-and-language
A curated list of awesome vision and language resources (still under construction... stay tuned!)
richard-peng-xia/awesome-multimodal-in-medical-imaging
A collection of resources on applications of multi-modal learning in medical imaging.
kyegomez/CM3Leon
An open source implementation of "Scaling Autoregressive Multi-Modal Models: Pretraining and Instruction Tuning", an all-new multi modal AI that uses just a decoder to generate both text and images
HenryHZY/Awesome-Multimodal-LLM
Research Trends in LLM-guided Multimodal Learning.
mmaaz60/mvits_for_class_agnostic_od
[ECCV'22] Official repository of paper titled "Class-agnostic Object Detection with Multi-modal Transformer".
UCSC-VLAA/CLIPA
[NeurIPS 2023] This repository includes the official implementation of our paper "An Inverse Scaling Law for CLIP Training"
MMMU-Benchmark/MMMU
This repo contains evaluation code for the paper "MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI"
ilaria-manco/multimodal-ml-music
List of academic resources on Multimodal ML for Music
Pointcept/GPT4Point
[CVPR'24 Highlight] GPT4Point: A Unified Framework for Point-Language Understanding and Generation.
DmitryRyumin/ICASSP-2023-24-Papers
ICASSP 2023-2024 Papers: A complete collection of influential and exciting research papers from the ICASSP 2023-24 conferences. Explore the latest advancements in acoustics, speech and signal processing. Code included. Star the repository to support the advancement of audio and signal processing!
HUANGLIZI/LViT
[IEEE Transactions on Medical Imaging/TMI] This repo is the official implementation of "LViT: Language meets Vision Transformer in Medical Image Segmentation"
ys-zong/awesome-self-supervised-multimodal-learning
A curated list of self-supervised multimodal learning resources.
snap-research/MMVID
[CVPR 2022] Show Me What and Tell Me How: Video Synthesis via Multimodal Conditioning
antoyang/TubeDETR
[CVPR 2022 Oral] TubeDETR: Spatio-Temporal Video Grounding with Transformers
antoyang/VidChapters
[NeurIPS 2023 D&B] VidChapters-7M: Video Chapters at Scale
mhw32/multimodal-vae-public
A PyTorch implementation of "Multimodal Generative Models for Scalable Weakly-Supervised Learning" (https://arxiv.org/abs/1802.05335)
antoyang/FrozenBiLM
[NeurIPS 2022] Zero-Shot Video Question Answering via Frozen Bidirectional Language Models
YiLunLee/missing_aware_prompts
Multimodal Prompting with Missing Modalities for Visual Recognition, CVPR'23
OFA-Sys/OFASys
OFASys: A Multi-Modal Multi-Task Learning System for Building Generalist Models