multimodal-deep-learning
There are 391 repositories under multimodal-deep-learning topic.
salesforce/LAVIS
LAVIS - A One-stop Library for Language-Vision Intelligence
Yutong-Zhou-cv/Awesome-Text-to-Image
(ෆ`꒳´ෆ) A Survey on Text-to-Image Generation/Synthesis.
kyegomez/BitNet
Implementation of "BitNet: Scaling 1-bit Transformers for Large Language Models" in pytorch
AI4Finance-Foundation/FinRobot
FinRobot: An Open-Source AI Agent Platform for Financial Analysis using LLMs 🚀 🚀 🚀
AlibabaResearch/AdvancedLiterateMachinery
A collection of original, innovative ideas and algorithms towards Advanced Literate Machinery. This project is maintained by the OCR Team in the Language Technology Lab, Tongyi Lab, Alibaba Group.
KimMeen/Time-LLM
[ICLR 2024] Official implementation of " 🦙 Time-LLM: Time Series Forecasting by Reprogramming Large Language Models"
jrzaurin/pytorch-widedeep
A flexible package for multimodal-deep-learning to combine tabular data with text and images using Wide and Deep models in Pytorch
DWCTOD/CVPR2024-Papers-with-Code-Demo
收集 CVPR 最新的成果,包括论文、代码和demo视频等,欢迎大家推荐!Collect the latest CVPR (Conference on Computer Vision and Pattern Recognition) results, including papers, code, and demo videos, etc., and welcome recommendations from everyone!
yuewang-cuhk/awesome-vision-language-pretraining-papers
Recent Advances in Vision and Language PreTrained Models (VL-PTMs)
TheShadow29/awesome-grounding
awesome grounding: A curated list of research papers in visual grounding
declare-lab/multimodal-deep-learning
This repository contains various models targetting multimodal representation learning, multimodal fusion for downstream tasks such as multimodal sentiment analysis.
omriav/blended-latent-diffusion
Official implementation for "Blended Latent Diffusion" [SIGGRAPH 2023]
richard-peng-xia/awesome-multimodal-in-medical-imaging
A collection of resources on applications of multi-modal learning in medical imaging.
jianghaojun/Awesome-Parameter-Efficient-Transfer-Learning
A collection of parameter-efficient transfer learning papers focusing on computer vision and multimodal domains.
MMMU-Benchmark/MMMU
This repo contains evaluation code for the paper "MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI"
theislab/scarches
Reference mapping for single-cell genomics
kyegomez/Med-PaLM
Towards Generalist Biomedical AI
Yutong-Zhou-cv/Awesome-Multimodality
A Survey on multimodal learning research.
fcakyon/content-moderation-deep-learning
Deep learning based content moderation from text, audio, video & image input modalities.
soujanyaporia/MUStARD
Multimodal Sarcasm Detection Dataset
westlake-repl/Recommendation-Systems-without-Explicit-ID-Features-A-Literature-Review
Paper List of Pre-trained Foundation Recommender Models
sail-sg/CLoT
CVPR'24, Official Codebase of our Paper: "Let's Think Outside the Box: Exploring Leap-of-Thought in Large Language Models with Creative Humor Generation".
phellonchen/awesome-Vision-and-Language-Pre-training
Recent Advances in Vision and Language Pre-training (VLP)
DWCTOD/ECCV2022-Papers-with-Code-Demo
收集 ECCV 最新的成果,包括论文、代码和demo视频等,欢迎大家推荐!
ilaria-manco/multimodal-ml-music
List of academic resources on Multimodal ML for Music
MILVLG/prophet
Implementation of CVPR 2023 paper "Prompting Large Language Models with Answer Heuristics for Knowledge-based Visual Question Answering".
yuanze-lin/Learnable_Regions
[CVPR 2024] Official code for "Text-Driven Image Editing via Learnable Regions"
david-yoon/multimodal-speech-emotion
TensorFlow implementation of "Multimodal Speech Emotion Recognition using Audio and Text," IEEE SLT-18
declare-lab/awesome-emotion-recognition-in-conversations
A comprehensive reading list for Emotion Recognition in Conversations
YuanGongND/cav-mae
Code and Pretrained Models for ICLR 2023 Paper "Contrastive Audio-Visual Masked Autoencoder".
drprojects/DeepViewAgg
[CVPR'22 Best Paper Finalist] Official PyTorch implementation of the method presented in "Learning Multi-View Aggregation In the Wild for Large-Scale 3D Semantic Segmentation"
remyxai/VQASynth
Compose multimodal datasets 🎹
geoaigroup/awesome-vision-language-models-for-earth-observation
A curated list of awesome vision and language resources for earth observation.
AnkurDeria/MFT
Pytorch implementation of Multimodal Fusion Transformer for Remote Sensing Image Classification.
DavidHuji/CapDec
CapDec: SOTA Zero Shot Image Captioning Using CLIP and GPT2, EMNLP 2022 (findings)
kyegomez/NaViT
My implementation of "Patch n’ Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution"