/SER-datasets

A collection of datasets for the purpose of emotion recognition/detection in speech.

Primary LanguageHTMLMIT LicenseMIT

Spoken Emotion Recognition Datasets: A collection of datasets for the purpose of emotion recognition/detection in speech. The table is chronologically ordered and includes a description of the content of each dataset along with the emotions included.

Dataset Year Content Emotions Format Size Language Paper Access License
MSP-Podcast corpus 2020 100 hours by over 100 speakers (see db link for details). This corpus is annotated with emotional labels using attribute-based descriptors (activation, dominance and valence) and categorical labels (anger, happiness, sadness, disgust, surprised, fear, contempt, neutral and other). Audio -- -- The MSP-Conversation Corpus Restricted access Available under an Academic License & Commercial License
emotiontts_open_db 2020 Recordings and their associated transcriptions by a diverse group of speakers. 4 emotions: general, joy, anger, and sadness. Audio, Text -- Korean -- Partial open access CC BY-NC-SA 4.0
URDU-Dataset 2020 400 utterances by 38 speakers (27 male and 11 female). 4 emotions: angry, happy, neutral, and sad. Audio ~72.1 MB Urdu Cross Lingual Speech Emotion Recognition: Urdu vs. Western Languages Open access None specified
BAVED 2020 1935 recording by 61 speakers (45 male and 16 female). 3 levels of emotion. Audio ~195 MB Arabic -- Open access None specified
VIVAE 2020 non-speech, 1085 audio file by ~12 speakers. non-speech 6 emotions: achievement, anger, fear, pain, pleasure, and surprise with 3 emotional intensities (low, moderate, strong, peak). Audio -- -- -- Restricted access CC BY-NC-SA 4.0
SEWA 2019 more than 2000 minutes of audio-visual data of 398 people (201 male and 197 female) coming from 6 cultures. emotions are characterized using valence and arousal. Audio, Video -- Chinese, English, German, Greek, Hungarian and Serbian SEWA DB: A Rich Database for Audio-Visual Emotion and Sentiment Research in the Wild Restricted access EULA
MELD 2019 1400 dialogues and 14000 utterances from Friends TV series by multiple speakers. 7 emotions: Anger, disgust, sadness, joy, neutral, surprise and fear. MELD also has sentiment (positive, negative and neutral) annotation for each utterance. Audio, Video, Text ~10.1 GB English MELD: A Multimodal Multi-Party Dataset for Emotion Recognition in Conversations Open access GPL-3.0 License
ShEMO 2019 3000 semi-natural utterances, equivalent to 3 hours and 25 minutes of speech data from online radio plays by 87 native-Persian speakers. 6 emotions: anger, fear, happiness, sadness, neutral and surprise. Audio ~1014 MB Persian ShEMO: a large-scale validated database for Persian speech emotion detection Open access None sepcified
DEMoS 2019 9365 emotional and 332 neutral samples produced by 68 native speakers (23 females, 45 males). 7/6 emotions: anger, sadness, happiness, fear, surprise, disgust, and the secondary emotion guilt. Audio -- Italian DEMoS: An Italian emotional speech corpus. Elicitation methods, machine learning, and perception Restricted access EULA: End User License Agreement
AESDD 2018 around 500 utterances by a diverse group of actors (over 5 actors) simlating various emotions. 5 emotions: anger, disgust, fear, happiness, and sadness. Audio ~392 MB Greek Speech Emotion Recognition for Performance Interaction Open access None specified
Emov-DB 2018 Recordings for 4 speakers- 2 males and 2 females. The emotional styles are neutral, sleepiness, anger, disgust and amused. Audio 5.88 GB English The emotional voices database: Towards controlling the emotion dimension in voice generation systems Open access None specified
RAVDESS 2018 7356 recordings by 24 actors. 7 emotions: calm, happy, sad, angry, fearful, surprise, and disgust Audio, Video ~24.8 GB English The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in North American English Open access CC BY-NC-SA 4.0
JL corpus 2018 2400 recording of 240 sentences by 4 actors (2 males and 2 females). 5 primary emotions: angry, sad, neutral, happy, excited. 5 secondary emotions: anxious, apologetic, pensive, worried, enthusiastic. Audio -- English An Open Source Emotional Speech Corpus for Human Robot Interaction Applications Open access CC0 1.0
CaFE 2018 6 different sentences by 12 speakers (6 fmelaes + 6 males). 7 emotions: happy, sad, angry, fearful, surprise, disgust and neutral. Each emotion is acted in 2 different intensities. Audio ~2 GB French (Canadian) -- Open access CC BY-NC-SA 4.0
EmoFilm 2018 1115 audio instances sentences extracted from various films. 5 emotions: anger, contempt, happiness, fear, and sadness. Audio -- English, Italian & Spanish Categorical vs Dimensional Perception of Italian Emotional Speech Restricted access EULA:End User License Agreement
ANAD 2018 1384 recording by multiple speakers. 3 emotions: angry, happy, surprised. Audio ~2 GB Arabic Arabic Natural Audio Dataset Open access CC BY-NC-SA 4.0
EmoSynth 2018 144 audio file labelled by 40 listeners. Emotion (no speech) defined in regard of valence and arousal. Audio 103.4 MB -- The Perceived Emotion of Isolated Synthetic Audio: The EmoSynth Dataset and Results Open access Creative Commons Attribution 4.0 International
CMU-MOSEI 2018 65 hours of annotated video from more than 1000 speakers and 250 topics. 6 Emotion (happiness, sadness, anger,fear, disgust, surprise) + Likert scale. Audio, Video -- English Multi-attention Recurrent Network for Human Communication Comprehension Open access License
CMU-MOSI 2017 2199 opinion utterances with annotated sentiment. Sentiment annotated between very negative to very positive in seven Likert steps. Audio, Video -- English Multi-attention Recurrent Network for Human Communication Comprehension Open access License
MSP-IMPROV 2017 20 sentences by 12 actors. 4 emotions: angry, sad, happy, neutral, other, without agreement Audio, Video -- English MSP-IMPROV: An Acted Corpus of Dyadic Interactions to Study Emotion Perception Restricted access Available under an Academic License & Commercial License
CREMA-D 2017 7442 clip of 12 sentences spoken by 91 actors (48 males and 43 females). 6 emotions: angry, disgusted, fearful, happy, neutral, and sad Audio, Video -- English CREMA-D: Crowd-sourced Emotional Multimodal Actors Dataset Open access Available under the Open Database License & Database Content License
Example emotion videos used in investigation of emotion perception in schizophrenia. 2017 6 videos:Two example videos from each emotion category (angry, happy and neutral) by one female speaker. 3 emotions: angry, happy and neutral. Audio, Video ~63 MB English -- Open access Available under the Permitted Non-commercial Re-use with Acknowledgement
EMOVO 2014 6 actors who played 14 sentences. 6 emotions: disgust, fear, anger, joy, surprise, sadness. Audio ~355 MB Italian EMOVO Corpus: an Italian Emotional Speech Database Open access None specified
RECOLA 2013 3.8 hours of recordings by 46 participants. negative and positive sentiment (valence and arousal). Audio, Video -- -- Introducing the RECOLA Multimodal Corpus of Remote Collaborative and Affective Interactions Restricted access Available under an Academic License & Commercial License
GEMEP corpus 2012 Videos10 actors portraying 10 states. 12 emotions: amusement, anxiety, cold anger (irritation), despair, hot anger (rage), fear (panic), interest, joy (elation), pleasure(sensory), pride, relief, and sadness. Plus, 5 additional emotions: admiration, contempt, disgust, surprise, and tenderness. Audio, Video -- French Introducing the Geneva Multimodal Expression Corpus for Experimental Research on Emotion Perception Restricted access None specified
OGVC 2012 9114 spontaneous utterances and 2656 acted utterances by 4 professional actors (two male and two female). 9 emotional states: fear, surprise, sadness, disgust, anger, anticipation, joy, acceptance and the neutral state. Audio -- Japanese Naturalistic emotional speech collectionparadigm with online game and its psychological and acoustical assessment Restricted access None specified
LEGO corpus 2012 347 dialogs with 9,083 system-user exchanges. Emotions classified as garbage, non-angry, slightly angry and very angry. Audio 1.1 GB -- A Parameterized and Annotated Spoken Dialog Corpus of the CMU Let’s Go Bus Information System Open access Lincense available with the data. Free of charges for research purposes only.
SAVEE 2011 480 British English utterances by 4 males actors. 7 emotions: anger, disgust, fear, happiness, sadness, surprise and neutral. Audio, Video -- English (British) Multimodal Emotion Recognition Restrictted access Free of charges for research purposes only.
TESS 2010 2800 recording by 2 actresses. 7 emotions: anger, disgust, fear, happiness, pleasant surprise, sadness, and neutral. Audio -- English BEHAVIOURAL FINDINGS FROM THE TORONTO EMOTIONAL SPEECH SET Open access CC BY-NC-ND 4.0
EEKK 2007 26 text passage read by 10 speakers. 4 main emotions: joy, sadness, anger and neutral. -- ~352 MB Estonian Estonian Emotional Speech Corpus Open access CC-BY license
IEMOCAP 2007 12 hours of audiovisual data by 10 actors. 5 emotions: happiness, anger, sadness, frustration and neutral. -- -- English IEMOCAP: Interactive emotional dyadic motion capture database Restricted access License
Keio-ESD 2006 A set of human speech with vocal emotion spoken by a Japanese male speaker. 47 emotions including angry, joyful, disgusting, downgrading, funny, worried, gentle, relief, indignation, shameful, etc. Audio -- Japanese EMOTIONAL SPEECH SYNTHESIS USING SUBSPACE CONSTRAINTS IN PROSODY Restricted access Available for research purposes only
EMO-DB 2005 800 recording spoken by 10 actors (5 males and 5 females). 7 emotions: anger, neutral, fear, boredom, happiness, sadness, disgust. Audio -- German A Database of German Emotional Speech Open access None specified
eNTERFACE05 2005 Videos by 42 subjects, coming from 14 different nationalities. 6 emotions: anger, fear, surprise, happiness, sadness and disgust. Audio, Video ~0.8 GB German The eNTERFACE’05 Audio-Visual Emotion Database Open access Free of charges for research purposes only
DES 2002 4 speakers (2 males and 2 females). 5 emotions: neutral, surprise, happiness, sadness and anger -- -- Danish Documentation of the Danish Emotional Speech Database

References

  • Swain, Monorama & Routray, Aurobinda & Kabisatpathy, Prithviraj, Databases, features and classifiers for speech emotion recognition: a review, International Journal of Speech Technology, paper
  • Dimitrios Ververidis and Constantine Kotropoulos, A State of the Art Review on Emotional Speech Databases, Artificial Intelligence & Information Analysis Laboratory, Department of Informatics Aristotle, University of Thessaloniki, paper
  • A. Pramod Reddy and V. Vijayarajan, Extraction of Emotions from Speech-A Survey, VIT University, International Journal of Applied Engineering Research, paper
  • Emotional Speech Databases, document
  • Expressive Synthetic Speech, website
  • Untitled, Technical university Munich, document

Contribution

  • All contributions are welcome! If you know a dataset that belongs here (see criteria) but is not listed, please feel free to add it. For more information on Contributing, please refer to CONTRIBUTING.md.

  • If you notice a typo or a mistake, please report this as an issue and help us improve the quality of this list.

Disclaimer

  • The mainter and the contributors try their best to keep this list up-to-date, and to only include working links (using automated verification with the help of the urlchecker-action). However, we cannot guarantee that all listed links are up-to-date. Read more in DISCLAIMER.md.