multimodal-learning

There are 239 repositories under multimodal-learning topic.

pliang279/awesome-multimodal-ml
Reading list for research topics in multimodal machine learning
5.6k 178 15821
mlfoundations/open_flamingo
An open-source framework for training large multimodal models.
Language:Python3.5k 47 170265
Eurus-Holmes/Awesome-Multimodal-Research
A curated list of Multimodal Related Research.
Language:Python1.3k 40 1150
DmitryRyumin/ICCV-2023-Papers
ICCV 2023 Papers: Discover cutting-edge research from ICCV 2023, the leading computer vision conference. Stay updated on the latest in computer vision and deep learning, with code included. ⭐ support visual intelligence development!
Language:Python886 12 1038
AILab-CVC/UniRepLKNet
[CVPR'24] UniRepLKNet: A Universal Perception Large-Kernel ConvNet for Audio, Video, Point Cloud, Time-Series and Image Recognition
Language:Python846 12 1651
PreferredAI/cornac
A Comparative Framework for Multimodal Recommender Systems
Language:Python837 25 148135
ArrowLuo/CLIP4Clip
An official implementation for "CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval"
Language:Python803 12 109117
HuaizhengZhang/Awsome-Deep-Learning-for-Video-Analysis
Papers, code and datasets about deep learning and multi-modal learning for video analysis
725 33 3168
declare-lab/multimodal-deep-learning
This repository contains various models targetting multimodal representation learning, multimodal fusion for downstream tasks such as multimodal sentiment analysis.
Language:OpenEdge ABL675 5 8141
henghuiding/ReLA
[CVPR2023 Highlight] GRES: Generalized Referring Expression Segmentation
Language:Python653 5 2315
georgian-io/Multimodal-Toolkit
Multimodal model for text and tabular data with HuggingFace transformers as building block for text data
Language:Python566 25 5484
njustkmg/OMML
Multi-Modal learning toolkit based on PaddlePaddle and PyTorch, supporting multiple applications such as multi-modal classification, cross-modal retrieval and image caption.
Language:Python558 41 299
subho406/OmniNet
Official Pytorch implementation of "OmniNet: A unified architecture for multi-modal multi-task learning" | Authors: Subhojeet Pramanik, Priyanka Agrawal, Aman Hussain
Language:Python511 19 760
henghuiding/MeViS
[ICCV 2023] MeViS: A Large-scale Benchmark for Video Segmentation with Motion Expressions
Language:Python465 7 917
microsoft/XPretrain
Multi-modality pre-training
Language:Python448 14 3434
pliang279/MultiBench
[NeurIPS 2021] Multiscale Benchmarks for Multimodal Representation Learning
Language:HTML447 16 3165
pykale/pykale
Knowledge-Aware machine LEarning (KALE): accessible machine learning from multiple sources for interdisciplinary research, part of the 🔥PyTorch ecosystem. ⭐ Star to support our work!
Language:Python432 11 11364
sangminwoo/awesome-vision-and-language
A curated list of awesome vision and language resources (still under construction... stay tuned!)
408 11 235
richard-peng-xia/awesome-multimodal-in-medical-imaging
A collection of resources on applications of multi-modal learning in medical imaging.
360 13 137
kyegomez/CM3Leon
An open source implementation of "Scaling Autoregressive Multi-Modal Models: Pretraining and Instruction Tuning", an all-new multi modal AI that uses just a decoder to generate both text and images
Language:Python340 21 1517
HenryHZY/Awesome-Multimodal-LLM
Research Trends in LLM-guided Multimodal Learning.
335 16 416
mmaaz60/mvits_for_class_agnostic_od
[ECCV'22] Official repository of paper titled "Class-agnostic Object Detection with Multi-modal Transformer".
Language:Python298 8 3024
UCSC-VLAA/CLIPA
[NeurIPS 2023] This repository includes the official implementation of our paper "An Inverse Scaling Law for CLIP Training"
Language:Python282 13 1110
MMMU-Benchmark/MMMU
This repo contains evaluation code for the paper "MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI"
Language:Python279 4 2519
ilaria-manco/multimodal-ml-music
List of academic resources on Multimodal ML for Music
Language:TeX272 14 012
Pointcept/GPT4Point
[CVPR'24 Highlight] GPT4Point: A Unified Framework for Point-Language Understanding and Generation.
Language:Python269 23 1518
DmitryRyumin/ICASSP-2023-24-Papers
ICASSP 2023-2024 Papers: A complete collection of influential and exciting research papers from the ICASSP 2023-24 conferences. Explore the latest advancements in acoustics, speech and signal processing. Code included. Star the repository to support the advancement of audio and signal processing!
Language:Python263 27 314
HUANGLIZI/LViT
[IEEE Transactions on Medical Imaging/TMI] This repo is the official implementation of "LViT: Language meets Vision Transformer in Medical Image Segmentation"
Language:Python262 4 4524
ys-zong/awesome-self-supervised-multimodal-learning
A curated list of self-supervised multimodal learning resources.
191 5 17
snap-research/MMVID
[CVPR 2022] Show Me What and Tell Me How: Video Synthesis via Multimodal Conditioning
Language:Python190 17 921
antoyang/TubeDETR
[CVPR 2022 Oral] TubeDETR: Spatio-Temporal Video Grounding with Transformers
Language:Python162 3 208
antoyang/VidChapters
[NeurIPS 2023 D&B] VidChapters-7M: Video Chapters at Scale
Language:Jupyter Notebook158 3 1715
mhw32/multimodal-vae-public
A PyTorch implementation of "Multimodal Generative Models for Scalable Weakly-Supervised Learning" (https://arxiv.org/abs/1802.05335)
Language:Python149 5 537
antoyang/FrozenBiLM
[NeurIPS 2022] Zero-Shot Video Question Answering via Frozen Bidirectional Language Models
Language:Python145 4 1523
YiLunLee/missing_aware_prompts
Multimodal Prompting with Missing Modalities for Visual Recognition, CVPR'23
Language:Python143 3 1811
OFA-Sys/OFASys
OFASys: A Multi-Modal Multi-Task Learning System for Building Generalist Models
Language:Python142 14 1010

multimodal-learning

pliang279/awesome-multimodal-ml

mlfoundations/open_flamingo

Eurus-Holmes/Awesome-Multimodal-Research

DmitryRyumin/ICCV-2023-Papers

AILab-CVC/UniRepLKNet

PreferredAI/cornac

ArrowLuo/CLIP4Clip

HuaizhengZhang/Awsome-Deep-Learning-for-Video-Analysis

declare-lab/multimodal-deep-learning

henghuiding/ReLA

georgian-io/Multimodal-Toolkit

njustkmg/OMML

subho406/OmniNet

henghuiding/MeViS

microsoft/XPretrain

pliang279/MultiBench

pykale/pykale

sangminwoo/awesome-vision-and-language

richard-peng-xia/awesome-multimodal-in-medical-imaging

kyegomez/CM3Leon

HenryHZY/Awesome-Multimodal-LLM

mmaaz60/mvits_for_class_agnostic_od

UCSC-VLAA/CLIPA

MMMU-Benchmark/MMMU

ilaria-manco/multimodal-ml-music

Pointcept/GPT4Point

DmitryRyumin/ICASSP-2023-24-Papers

HUANGLIZI/LViT

ys-zong/awesome-self-supervised-multimodal-learning

snap-research/MMVID

antoyang/TubeDETR

antoyang/VidChapters

mhw32/multimodal-vae-public

antoyang/FrozenBiLM

YiLunLee/missing_aware_prompts

OFA-Sys/OFASys