multimodal-deep-learning

There are 404 repositories under multimodal-deep-learning topic.

salesforce/LAVIS
LAVIS - A One-stop Library for Language-Vision Intelligence
Language:Jupyter Notebook10.1k 97 676979
Yutong-Zhou-cv/Awesome-Text-to-Image
(ෆ`꒳´ෆ) A Survey on Text-to-Image Generation/Synthesis.
2.2k 76 7192
AI4Finance-Foundation/FinRobot
FinRobot: An Open-Source AI Agent Platform for Financial Analysis using LLMs 🚀 🚀 🚀
Language:Jupyter Notebook1.9k 33 28285
kyegomez/BitNet
Implementation of "BitNet: Scaling 1-bit Transformers for Large Language Models" in pytorch
Language:Python1.7k 42 40156
KimMeen/Time-LLM
[ICLR 2024] Official implementation of " 🦙 Time-LLM: Time Series Forecasting by Reprogramming Large Language Models"
Language:Python1.6k 16 153270
AlibabaResearch/AdvancedLiterateMachinery
A collection of original, innovative ideas and algorithms towards Advanced Literate Machinery. This project is maintained by the OCR Team in the Language Technology Lab, Tongyi Lab, Alibaba Group.
Language:C++1.6k 38 186180
jrzaurin/pytorch-widedeep
A flexible package for multimodal-deep-learning to combine tabular data with text and images using Wide and Deep models in Pytorch
Language:Python1.3k 25 165191
DWCTOD/CVPR2024-Papers-with-Code-Demo
收集 CVPR 最新的成果，包括论文、代码和demo视频等，欢迎大家推荐！Collect the latest CVPR (Conference on Computer Vision and Pattern Recognition) results, including papers, code, and demo videos, etc., and welcome recommendations from everyone!
1.3k 27 28146
yuewang-cuhk/awesome-vision-language-pretraining-papers
Recent Advances in Vision and Language PreTrained Models (VL-PTMs)
1.1k 52 9104
TheShadow29/awesome-grounding
awesome grounding: A curated list of research papers in visual grounding
1k 29 598
declare-lab/multimodal-deep-learning
This repository contains various models targetting multimodal representation learning, multimodal fusion for downstream tasks such as multimodal sentiment analysis.
Language:OpenEdge ABL775 7 8156
richard-peng-xia/awesome-multimodal-in-medical-imaging
A collection of resources on applications of multi-modal learning in medical imaging.
605 17 361
omriav/blended-latent-diffusion
Official implementation for "Blended Latent Diffusion" [SIGGRAPH 2023]
Language:Jupyter Notebook582 47 1636
jianghaojun/Awesome-Parameter-Efficient-Transfer-Learning
A collection of parameter-efficient transfer learning papers focusing on computer vision and multimodal domains.
392 21 425
MMMU-Benchmark/MMMU
This repo contains evaluation code for the paper "MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI"
Language:Python370 4 3529
theislab/scarches
Reference mapping for single-cell genomics
Language:Jupyter Notebook343 12 17252
kyegomez/Med-PaLM
Towards Generalist Biomedical AI
Language:Python336 8 1749
fcakyon/content-moderation-deep-learning
Deep learning based content moderation from text, audio, video & image input modalities.
321 5 018
soujanyaporia/MUStARD
Multimodal Sarcasm Detection Dataset
Language:OpenEdge ABL321 8 1161
westlake-repl/Recommendation-Systems-without-Explicit-ID-Features-A-Literature-Review
Paper List of Pre-trained Foundation Recommender Models
321 10 127
Yutong-Zhou-cv/Awesome-Multimodality
A Survey on multimodal learning research.
320 12 122
sail-sg/CLoT
CVPR'24, Official Codebase of our Paper: "Let's Think Outside the Box: Exploring Leap-of-Thought in Large Language Models with Creative Humor Generation".
Language:Python301 8 2315
phellonchen/awesome-Vision-and-Language-Pre-training
Recent Advances in Vision and Language Pre-training (VLP)
289 11 315
DWCTOD/ECCV2022-Papers-with-Code-Demo
收集 ECCV 最新的成果，包括论文、代码和demo视频等，欢迎大家推荐！
287 7 623
ilaria-manco/multimodal-ml-music
List of academic resources on Multimodal ML for Music
Language:TeX285 14 012
yuanze-lin/Learnable_Regions
[CVPR 2024] Official code for "Text-Driven Image Editing via Learnable Regions"
Language:Python271 10 422
MILVLG/prophet
Implementation of CVPR 2023 paper "Prompting Large Language Models with Answer Heuristics for Knowledge-based Visual Question Answering".
Language:Python270 3 4027
david-yoon/multimodal-speech-emotion
TensorFlow implementation of "Multimodal Speech Emotion Recognition using Audio and Text," IEEE SLT-18
Language:Jupyter Notebook268 10 1470
declare-lab/awesome-emotion-recognition-in-conversations
A comprehensive reading list for Emotion Recognition in Conversations
264 25 443
YuanGongND/cav-mae
Code and Pretrained Models for ICLR 2023 Paper "Contrastive Audio-Visual Masked Autoencoder".
Language:Python244 5 2923
remyxai/VQASynth
Compose multimodal datasets 🎹
Language:Python234 5 1413
drprojects/DeepViewAgg
[CVPR'22 Best Paper Finalist] Official PyTorch implementation of the method presented in "Learning Multi-View Aggregation In the Wild for Large-Scale 3D Semantic Segmentation"
Language:Python228 10 3725
geoaigroup/awesome-vision-language-models-for-earth-observation
A curated list of awesome vision and language resources for earth observation.
204 5 216
kyegomez/NaViT
My implementation of "Patch n’ Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution"
Language:Python197 7 510
AnkurDeria/MFT
Pytorch implementation of Multimodal Fusion Transformer for Remote Sensing Image Classification.
Language:Jupyter Notebook190 5 1320
DavidHuji/CapDec
CapDec: SOTA Zero Shot Image Captioning Using CLIP and GPT2, EMNLP 2022 (findings)
Language:Python189 5 1519

multimodal-deep-learning

salesforce/LAVIS

Yutong-Zhou-cv/Awesome-Text-to-Image

AI4Finance-Foundation/FinRobot

kyegomez/BitNet

KimMeen/Time-LLM

AlibabaResearch/AdvancedLiterateMachinery

jrzaurin/pytorch-widedeep

DWCTOD/CVPR2024-Papers-with-Code-Demo

yuewang-cuhk/awesome-vision-language-pretraining-papers

TheShadow29/awesome-grounding

declare-lab/multimodal-deep-learning

richard-peng-xia/awesome-multimodal-in-medical-imaging

omriav/blended-latent-diffusion

jianghaojun/Awesome-Parameter-Efficient-Transfer-Learning

MMMU-Benchmark/MMMU

theislab/scarches

kyegomez/Med-PaLM

fcakyon/content-moderation-deep-learning

soujanyaporia/MUStARD

westlake-repl/Recommendation-Systems-without-Explicit-ID-Features-A-Literature-Review

Yutong-Zhou-cv/Awesome-Multimodality

sail-sg/CLoT

phellonchen/awesome-Vision-and-Language-Pre-training

DWCTOD/ECCV2022-Papers-with-Code-Demo

ilaria-manco/multimodal-ml-music

yuanze-lin/Learnable_Regions

MILVLG/prophet

david-yoon/multimodal-speech-emotion

declare-lab/awesome-emotion-recognition-in-conversations

YuanGongND/cav-mae

remyxai/VQASynth

drprojects/DeepViewAgg

geoaigroup/awesome-vision-language-models-for-earth-observation

kyegomez/NaViT

AnkurDeria/MFT

DavidHuji/CapDec