This repository is build in association with our position paper on "Multimodality for NLP-Centered Applications: Resources, Advances and
Frontiers".
As a part of this release we share the information about recent multimodal datasets which are available for research purposes.
We found that although 100+ multimodal language resources are available in literature for various NLP tasks, still publicly available multimodal datasets are under-explored for its re-usage in subsequent problem domains.
Multimodal datasets for NLP Applications
Sentiment Analysis
Dataset
Title of the Paper
Link of the Paper
Link of the Dataset
EmoDB
A Database of German Emotional Speech
Paper
Dataset
VAM
The Vera am Mittag German Audio-Visual Emotional Speech Database
Paper
Dataset
IEMOCAP
IEMOCAP: interactive emotional dyadic motion capture database
Paper
Dataset
Mimicry
A Multimodal Database for Mimicry Analysis
Paper
Dataset
YouTube
Towards Multimodal Sentiment Analysis:Harvesting Opinions from the Web
Paper
Dataset
HUMAINE
The HUMAINE database
Paper
Dataset
Large Movies
Sentiment classification on Large Movie Review
Paper
Dataset
SEMAINE
The SEMAINE Database: Annotated Multimodal Records of Emotionally Colored Conversations between a Person and a Limited Agent
Paper
Dataset
AFEW
Collecting Large, Richly Annotated Facial-Expression Databases from Movies
Paper
Dataset
SST
Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank
Paper
Dataset
ICT-MMMO
YouTube Movie Reviews: Sentiment Analysis in an AudioVisual Context
Paper
Dataset
RECOLA
Introducing the RECOLA multimodal corpus of remote collaborative and affective interactions
Paper
Dataset
MOUD
Utterance-Level Multimodal Sentiment Analysis
Paper
CMU-MOSI
MOSI: Multimodal Corpus of Sentiment Intensity and Subjectivity Analysis in Online Opinion Videos
Paper
Dataset
POM
Multimodal Analysis and Prediction of Persuasiveness in Online Social Multimedia
Paper
Dataset
MELD
MELD: A Multimodal Multi-Party Dataset for Emotion Recognition in Conversations
Paper
Dataset
CMU-MOSEI
Multimodal Language Analysis in the Wild: CMU-MOSEI Dataset and Interpretable Dynamic Fusion Graph
Paper
Dataset
AMMER
Towards Multimodal Emotion Recognition in German Speech Events in Cars using Transfer Learning
Paper
On Request
SEWA
SEWA DB: A Rich Database for Audio-Visual Emotion and Sentiment Research in the Wild
Paper
Dataset
Fakeddit
r/fakeddit: A new multimodal benchmark dataset for fine-grained fake news detection
Paper
Dataset
CMU-MOSEAS
CMU-MOSEAS: A Multimodal Language Dataset for Spanish, Portuguese, German and French
Paper
Dataset
MultiOFF
Multimodal meme dataset (MultiOFF) for identifying offensive content in image and text
Paper
Dataset
MEISD
MEISD: A Multimodal Multi-Label Emotion, Intensity and Sentiment Dialogue Dataset for Emotion Recognition and Sentiment Analysis in Conversations
Paper
Dataset
TASS
Overview of TASS 2020: Introducing Emotion
Paper
Dataset
CH SIMS
CH-SIMS: A Chinese Multimodal Sentiment Analysis Dataset with Fine-grained Annotation of Modality
Paper
Dataset
Creep-Image
A Multimodal Dataset of Images and Text
Paper
Dataset
Entheos
Entheos: A Multimodal Dataset for Studying Enthusiasm
Paper
Dataset
Machine Translation
Dataset
Title of the Paper
Link of the Paper
Link of the Dataset
Multi30K
Multi30K: Multilingual English-German Image Description
Paper
Dataset
How2
How2: A Large-scale Dataset for Multimodal Language Understanding
Paper
Dataset
MLT
Multimodal Lexical Translation
Paper
Dataset
IKEA
A Visual Attention Grounding Neural Model for Multimodal Machine Translation
Paper
Dataset
Flickr30K (EN- (hi-IN))
Multimodal Neural Machine Translation for Low-resource Language Pairs using Synthetic Data
Paper
On Request
Hindi Visual Genome
Hindi Visual Genome: A Dataset for Multimodal English-to-Hindi Machine Translation
Paper
Dataset
HowTo100M
Multilingual Multimodal Pre-training for Zero-Shot Cross-Lingual Transfer of Vision-Language Models
Paper
Dataset
Information Retrieval
Dataset
Title of the Paper
Link of the Paper
Link of the Dataset
MUSICLEF
MusiCLEF: a Benchmark Activity in Multimodal Music Information Retrieval
Paper
Dataset
Moodo
The Moodo dataset: Integrating user context with emotional and color perception of music for affective music information retrieval
Paper
Dataset
ALF-200k
ALF-200k: Towards Extensive Multimodal Analyses of Music Tracks and Playlists
Paper
Dataset
MQA
Can Image Captioning Help Passage Retrieval in Multimodal Question Answering?
Paper
Dataset
WAT2019
WAT2019: English-Hindi Translation on Hindi Visual Genome Dataset
Paper
Dataset
ViTT
Multimodal Pretraining for Dense Video Captioning
Paper
Dataset
MTD
MTD: A Multimodal Dataset of Musical Themes for MIR Research
Paper
Dataset
MusiClef
A professionally annotated and enriched multimodal data set on popular music
Paper
Dataset
Schubert Winterreise
Schubert Winterreise dataset: A multimodal scenario for music analysis
Paper
Dataset
WIT
WIT: Wikipedia-based Image Text Dataset for Multimodal Multilingual Machine Learning
Paper
Dataset
Question Answering
Dataset
Title of the Paper
Link of the Paper
Link of the Dataset
MQA
A Dataset for Multimodal Question Answering in the Cultural Heritage Domain
Paper
-
MovieQA
Movieqa: Understanding stories in movies through question-answering MovieQA
Paper
Dataset
PororoQA
Deep story video story qa by deep embedded memory networks
Paper
Dataset
MemexQA
MemexQA: Visual Memex Question Answering
Paper
Dataset
VQA
Making the V in VQA matter: Elevating the role of image understanding in Visual Question Answering
Paper
Dataset
TDIUC
An analysis of visual question answering algorithms
Paper
Dataset
TGIF-QA
TGIF-QA: Toward spatio-temporal reasoning in visual question answering
Paper
Dataset
MSVD QA, MSRVTT QA
Video question answering via attribute augmented attention network learning
Paper
Dataset
YouTube2Text
Video Question Answering via Gradually Refined Attention over Appearance and Motion
Paper
Dataset
MovieFIB
A dataset and exploration of models for understanding video data through fill-in-the-blank question-answering
Paper
Dataset
Video Context QA
Uncovering the temporal context for video question answering
Paper
Dataset
MarioQA
Marioqa: Answering questions by watching gameplay videos
Paper
Dataset
TVQA
Tvqa: Localized, compositional video question answering
Paper
Dataset
VQA-CP v2
Don’t just assume; look and answer: Overcoming priors for visual question answering
Paper
Dataset
RecipeQA
RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes
Paper
Dataset
GQA
GQA: A new dataset for real-world visual reasoning and compositional question answering
Paper
Dataset
Social IQ
Social-iq: A question answering benchmark for artificial social intelligence
Paper
Dataset
MIMOQA
MIMOQA: Multimodal Input Multimodal Output Question Answering
Paper
-
Summarization
Dataset
Title of the Paper
Link of the Paper
Link of the Dataset
SumMe
Tvsum: Summarizing web videos using titles
Paper
Dataset
TVSum
Creating summaries from user videos
Paper
Dataset
QFVS
Query-focused video summarization: Dataset, evaluation, and a memory network based approach
Paper
Dataset
MMSS
Multi-modal Sentence Summarization with Modality Attention and Image Filtering
Paper
-
MSMO
MSMO: Multimodal Summarization with Multimodal Output
Paper
-
Screen2Words
Screen2Words: Automatic Mobile UI Summarization with Multimodal Learning
Paper
Dataset
AVIATE
IEMOCAP: interactive emotional dyadic motion capture database
Paper
Dataset
Multimodal Microblog Summarizaion
On Multimodal Microblog Summarization
Paper
-
Human Computer Interaction
Dataset
Title of the Paper
Link of the Paper
Link of the Dataset
CAUVE
CUAVE: A new audio-visual database for multimodal human-computer interface research
Paper
Dataset
MHAD
Berkeley mhad: A comprehensive multimodal human action database
Paper
Dataset
Multi-party interactions
A Multi-party Multi-modal Dataset for Focus of Visual Attention in Human-human and Human-robot Interaction
Paper
-
MHHRI
Multimodal human-human-robot interactions (mhhri) dataset for studying personality and engagement
Paper
Dataset
Red Hen Lab
Red Hen Lab: Dataset and Tools for Multimodal Human Communication Research
Paper
-
EMRE
Generating a Novel Dataset of Multimodal Referring Expressions
Paper
Dataset
Chinese Whispers
Chinese whispers: A multimodal dataset for embodied language grounding
Paper
Dataset
uulmMAC
The uulmMAC database—A multimodal affective corpus for affective computing in human-computer interaction
Paper
Dataset
Semantic Analysis
Dataset
Title of the Paper
Link of the Paper
Link of the Dataset
WN9-IMG
Image-embodied Knowledge Representation Learning
Paper
Dataset
Wikimedia Commons
A Dataset and Reranking Method for Multimodal MT of User-Generated Image Captions
Paper
Dataset
Starsem18-multimodalKB
A Multimodal Translation-Based Approach for Knowledge Graph Representation Learning
Paper
Dataset
MUStARD
Towards Multimodal Sarcasm Detection
Paper
Dataset
YouMakeup
YouMakeup: A Large-Scale Domain-Specific Multimodal Dataset for Fine-Grained Semantic Comprehension
Paper
Dataset
MDID
Integrating Text and Image: Determining Multimodal Document Intent in Instagram Posts
Paper
Dataset
Social media posts from Flickr (Mental Health)
Inferring Social Media Users’ Mental Health Status from Multimodal Information
Paper
Dataset
Twitter MEL
Building a Multimodal Entity Linking Dataset From Tweets Building a Multimodal Entity Linking Dataset From Tweets
Paper
Dataset
MultiMET
MultiMET: A Multimodal Dataset for Metaphor Understanding
Paper
-
MSDS
Multimodal Sarcasm Detection in Spanish: a Dataset and a Baseline
Paper
Dataset
Miscellaneous
Dataset
Title of the Paper
Link of the Paper
Link of the Dataset
MS COCO
Microsoft COCO: Common objects in context
Paper
Dataset
ILSVRC
ImageNet Large Scale Visual Recognition Challenge
Paper
Dataset
YFCC100M
YFCC100M: The new data in multimedia research
Paper
Dataset
COGNIMUSE
COGNIMUSE: a multimodal video database annotated with saliency, events, semantics and emotion with application to summarization
Paper
Dataset
SNAG
SNAG: Spoken Narratives and Gaze Dataset
Paper
Dataset
UR-Funny
UR-FUNNY: A Multimodal Language Dataset for Understanding Humor
Paper
Dataset
Bag-of-Lies
Bag-of-Lies: A Multimodal Dataset for Deception Detection
Paper
Dataset
MARC
A Recipe for Creating Multimodal Aligned Datasets for Sequential Tasks
Paper
Dataset
MuSE
MuSE: a Multimodal Dataset of Stressed Emotion
Paper
Dataset
BabelPic
Fatality Killed the Cat or: BabelPic, a Multimodal Dataset for Non-Concrete Concept
Paper
Dataset
Eye4Ref
Eye4Ref: A Multimodal Eye Movement Dataset of Referentially Complex Situations
Paper
-
Troll Memes
A Dataset for Troll Classification of TamilMemes
Paper
Dataset
SEMD
EmoSen: Generating sentiment and emotion controlled responses in a multimodal dialogue system
Paper
-
Chat talk Corpus
Construction and Analysis of a Multimodal Chat-talk Corpus for Dialog Systems Considering Interpersonal Closeness
Paper
-
EMOTyDA
Towards Emotion-aided Multi-modal Dialogue Act Classification
Paper
Dataset
MELINDA
MELINDA: A Multimodal Dataset for Biomedical Experiment Method Classification
Paper
Dataset
NewsCLIPpings
NewsCLIPpings: Automatic Generation of Out-of-Context Multimodal Media
Paper
Dataset
R2VQ
Designing Multimodal Datasets for NLP Challenges
Paper
Dataset
M2H2
M2H2: A Multimodal Multiparty Hindi Dataset For Humor Recognition in Conversations
Paper
Dataset