Multimodal datasets

This repository is build in association with our position paper on "Multimodality for NLP-Centered Applications: Resources, Advances and Frontiers".

As a part of this release we share the information about recent multimodal datasets which are available for research purposes.

We found that although 100+ multimodal language resources are available in literature for various NLP tasks, still publicly available multimodal datasets are under-explored for its re-usage in subsequent problem domains.

Multimodal datasets for NLP Applications

Sentiment Analysis

Dataset	Title of the Paper	Link of the Paper	Link of the Dataset
EmoDB	A Database of German Emotional Speech	Paper	Dataset
VAM	The Vera am Mittag German Audio-Visual Emotional Speech Database	Paper	Dataset
IEMOCAP	IEMOCAP: interactive emotional dyadic motion capture database	Paper	Dataset
Mimicry	A Multimodal Database for Mimicry Analysis	Paper	Dataset
YouTube	Towards Multimodal Sentiment Analysis:Harvesting Opinions from the Web	Paper	Dataset
HUMAINE	The HUMAINE database	Paper	Dataset
Large Movies	Sentiment classification on Large Movie Review	Paper	Dataset
SEMAINE	The SEMAINE Database: Annotated Multimodal Records of Emotionally Colored Conversations between a Person and a Limited Agent	Paper	Dataset
AFEW	Collecting Large, Richly Annotated Facial-Expression Databases from Movies	Paper	Dataset
SST	Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank	Paper	Dataset
ICT-MMMO	YouTube Movie Reviews: Sentiment Analysis in an AudioVisual Context	Paper	Dataset
RECOLA	Introducing the RECOLA multimodal corpus of remote collaborative and affective interactions	Paper	Dataset
MOUD	Utterance-Level Multimodal Sentiment Analysis	Paper
CMU-MOSI	MOSI: Multimodal Corpus of Sentiment Intensity and Subjectivity Analysis in Online Opinion Videos	Paper	Dataset
POM	Multimodal Analysis and Prediction of Persuasiveness in Online Social Multimedia	Paper	Dataset
MELD	MELD: A Multimodal Multi-Party Dataset for Emotion Recognition in Conversations	Paper	Dataset
CMU-MOSEI	Multimodal Language Analysis in the Wild: CMU-MOSEI Dataset and Interpretable Dynamic Fusion Graph	Paper	Dataset
AMMER	Towards Multimodal Emotion Recognition in German Speech Events in Cars using Transfer Learning	Paper	On Request
SEWA	SEWA DB: A Rich Database for Audio-Visual Emotion and Sentiment Research in the Wild	Paper	Dataset
Fakeddit	r/fakeddit: A new multimodal benchmark dataset for fine-grained fake news detection	Paper	Dataset
CMU-MOSEAS	CMU-MOSEAS: A Multimodal Language Dataset for Spanish, Portuguese, German and French	Paper	Dataset
MultiOFF	Multimodal meme dataset (MultiOFF) for identifying offensive content in image and text	Paper	Dataset
MEISD	MEISD: A Multimodal Multi-Label Emotion, Intensity and Sentiment Dialogue Dataset for Emotion Recognition and Sentiment Analysis in Conversations	Paper	Dataset
TASS	Overview of TASS 2020: Introducing Emotion	Paper	Dataset
CH SIMS	CH-SIMS: A Chinese Multimodal Sentiment Analysis Dataset with Fine-grained Annotation of Modality	Paper	Dataset
Creep-Image	A Multimodal Dataset of Images and Text	Paper	Dataset
Entheos	Entheos: A Multimodal Dataset for Studying Enthusiasm	Paper	Dataset

Machine Translation

Dataset	Title of the Paper	Link of the Paper	Link of the Dataset
Multi30K	Multi30K: Multilingual English-German Image Description	Paper	Dataset
How2	How2: A Large-scale Dataset for Multimodal Language Understanding	Paper	Dataset
MLT	Multimodal Lexical Translation	Paper	Dataset
IKEA	A Visual Attention Grounding Neural Model for Multimodal Machine Translation	Paper	Dataset
Flickr30K (EN- (hi-IN))	Multimodal Neural Machine Translation for Low-resource Language Pairs using Synthetic Data	Paper	On Request
Hindi Visual Genome	Hindi Visual Genome: A Dataset for Multimodal English-to-Hindi Machine Translation	Paper	Dataset
HowTo100M	Multilingual Multimodal Pre-training for Zero-Shot Cross-Lingual Transfer of Vision-Language Models	Paper	Dataset

Information Retrieval

Dataset	Title of the Paper	Link of the Paper	Link of the Dataset
MUSICLEF	MusiCLEF: a Benchmark Activity in Multimodal Music Information Retrieval	Paper	Dataset
Moodo	The Moodo dataset: Integrating user context with emotional and color perception of music for affective music information retrieval	Paper	Dataset
ALF-200k	ALF-200k: Towards Extensive Multimodal Analyses of Music Tracks and Playlists	Paper	Dataset
MQA	Can Image Captioning Help Passage Retrieval in Multimodal Question Answering?	Paper	Dataset
WAT2019	WAT2019: English-Hindi Translation on Hindi Visual Genome Dataset	Paper	Dataset
ViTT	Multimodal Pretraining for Dense Video Captioning	Paper	Dataset
MTD	MTD: A Multimodal Dataset of Musical Themes for MIR Research	Paper	Dataset
MusiClef	A professionally annotated and enriched multimodal data set on popular music	Paper	Dataset
Schubert Winterreise	Schubert Winterreise dataset: A multimodal scenario for music analysis	Paper	Dataset
WIT	WIT: Wikipedia-based Image Text Dataset for Multimodal Multilingual Machine Learning	Paper	Dataset

Question Answering

Dataset	Title of the Paper	Link of the Paper	Link of the Dataset
MQA	A Dataset for Multimodal Question Answering in the Cultural Heritage Domain	Paper	-
MovieQA	Movieqa: Understanding stories in movies through question-answering MovieQA	Paper	Dataset
PororoQA	Deep story video story qa by deep embedded memory networks	Paper	Dataset
MemexQA	MemexQA: Visual Memex Question Answering	Paper	Dataset
VQA	Making the V in VQA matter: Elevating the role of image understanding in Visual Question Answering	Paper	Dataset
TDIUC	An analysis of visual question answering algorithms	Paper	Dataset
TGIF-QA	TGIF-QA: Toward spatio-temporal reasoning in visual question answering	Paper	Dataset
MSVD QA, MSRVTT QA	Video question answering via attribute augmented attention network learning	Paper	Dataset
YouTube2Text	Video Question Answering via Gradually Refined Attention over Appearance and Motion	Paper	Dataset
MovieFIB	A dataset and exploration of models for understanding video data through fill-in-the-blank question-answering	Paper	Dataset
Video Context QA	Uncovering the temporal context for video question answering	Paper	Dataset
MarioQA	Marioqa: Answering questions by watching gameplay videos	Paper	Dataset
TVQA	Tvqa: Localized, compositional video question answering	Paper	Dataset
VQA-CP v2	Don’t just assume; look and answer: Overcoming priors for visual question answering	Paper	Dataset
RecipeQA	RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes	Paper	Dataset
GQA	GQA: A new dataset for real-world visual reasoning and compositional question answering	Paper	Dataset
Social IQ	Social-iq: A question answering benchmark for artificial social intelligence	Paper	Dataset
MIMOQA	MIMOQA: Multimodal Input Multimodal Output Question Answering	Paper	-

Summarization

Dataset	Title of the Paper	Link of the Paper	Link of the Dataset
SumMe	Tvsum: Summarizing web videos using titles	Paper	Dataset
TVSum	Creating summaries from user videos	Paper	Dataset
QFVS	Query-focused video summarization: Dataset, evaluation, and a memory network based approach	Paper	Dataset
MMSS	Multi-modal Sentence Summarization with Modality Attention and Image Filtering	Paper	-
MSMO	MSMO: Multimodal Summarization with Multimodal Output	Paper	-
Screen2Words	Screen2Words: Automatic Mobile UI Summarization with Multimodal Learning	Paper	Dataset
AVIATE	IEMOCAP: interactive emotional dyadic motion capture database	Paper	Dataset
Multimodal Microblog Summarizaion	On Multimodal Microblog Summarization	Paper	-

Human Computer Interaction

Dataset	Title of the Paper	Link of the Paper	Link of the Dataset
CAUVE	CUAVE: A new audio-visual database for multimodal human-computer interface research	Paper	Dataset
MHAD	Berkeley mhad: A comprehensive multimodal human action database	Paper	Dataset
Multi-party interactions	A Multi-party Multi-modal Dataset for Focus of Visual Attention in Human-human and Human-robot Interaction	Paper	-
MHHRI	Multimodal human-human-robot interactions (mhhri) dataset for studying personality and engagement	Paper	Dataset
Red Hen Lab	Red Hen Lab: Dataset and Tools for Multimodal Human Communication Research	Paper	-
EMRE	Generating a Novel Dataset of Multimodal Referring Expressions	Paper	Dataset
Chinese Whispers	Chinese whispers: A multimodal dataset for embodied language grounding	Paper	Dataset
uulmMAC	The uulmMAC database—A multimodal affective corpus for affective computing in human-computer interaction	Paper	Dataset

Semantic Analysis

Dataset	Title of the Paper	Link of the Paper	Link of the Dataset
WN9-IMG	Image-embodied Knowledge Representation Learning	Paper	Dataset
Wikimedia Commons	A Dataset and Reranking Method for Multimodal MT of User-Generated Image Captions	Paper	Dataset
Starsem18-multimodalKB	A Multimodal Translation-Based Approach for Knowledge Graph Representation Learning	Paper	Dataset
MUStARD	Towards Multimodal Sarcasm Detection	Paper	Dataset
YouMakeup	YouMakeup: A Large-Scale Domain-Specific Multimodal Dataset for Fine-Grained Semantic Comprehension	Paper	Dataset
MDID	Integrating Text and Image: Determining Multimodal Document Intent in Instagram Posts	Paper	Dataset
Social media posts from Flickr (Mental Health)	Inferring Social Media Users’ Mental Health Status from Multimodal Information	Paper	Dataset
Twitter MEL	Building a Multimodal Entity Linking Dataset From Tweets Building a Multimodal Entity Linking Dataset From Tweets	Paper	Dataset
MultiMET	MultiMET: A Multimodal Dataset for Metaphor Understanding	Paper	-
MSDS	Multimodal Sarcasm Detection in Spanish: a Dataset and a Baseline	Paper	Dataset

Miscellaneous

Dataset	Title of the Paper	Link of the Paper	Link of the Dataset
MS COCO	Microsoft COCO: Common objects in context	Paper	Dataset
ILSVRC	ImageNet Large Scale Visual Recognition Challenge	Paper	Dataset
YFCC100M	YFCC100M: The new data in multimedia research	Paper	Dataset
COGNIMUSE	COGNIMUSE: a multimodal video database annotated with saliency, events, semantics and emotion with application to summarization	Paper	Dataset
SNAG	SNAG: Spoken Narratives and Gaze Dataset	Paper	Dataset
UR-Funny	UR-FUNNY: A Multimodal Language Dataset for Understanding Humor	Paper	Dataset
Bag-of-Lies	Bag-of-Lies: A Multimodal Dataset for Deception Detection	Paper	Dataset
MARC	A Recipe for Creating Multimodal Aligned Datasets for Sequential Tasks	Paper	Dataset
MuSE	MuSE: a Multimodal Dataset of Stressed Emotion	Paper	Dataset
BabelPic	Fatality Killed the Cat or: BabelPic, a Multimodal Dataset for Non-Concrete Concept	Paper	Dataset
Eye4Ref	Eye4Ref: A Multimodal Eye Movement Dataset of Referentially Complex Situations	Paper	-
Troll Memes	A Dataset for Troll Classification of TamilMemes	Paper	Dataset
SEMD	EmoSen: Generating sentiment and emotion controlled responses in a multimodal dialogue system	Paper	-
Chat talk Corpus	Construction and Analysis of a Multimodal Chat-talk Corpus for Dialog Systems Considering Interpersonal Closeness	Paper	-
EMOTyDA	Towards Emotion-aided Multi-modal Dialogue Act Classification	Paper	Dataset
MELINDA	MELINDA: A Multimodal Dataset for Biomedical Experiment Method Classification	Paper	Dataset
NewsCLIPpings	NewsCLIPpings: Automatic Generation of Out-of-Context Multimodal Media	Paper	Dataset
R2VQ	Designing Multimodal Datasets for NLP Challenges	Paper	Dataset
M2H2	M2H2: A Multimodal Multiparty Hindi Dataset For Humor Recognition in Conversations	Paper	Dataset

glin93/Multimodal-datasets

Multimodal datasets

Multimodal datasets for NLP Applications