Spoken Emotion Recognition Datasets: A collection of datasets for the purpose of emotion recognition/detection in speech. The table is chronologically ordered and includes a description of the content of each dataset along with the emotions included.
Dataset | Year | Content | Emotions | Format | Size | Language | Paper | Access | License |
---|---|---|---|---|---|---|---|---|---|
MuSe-CAR | 2021 | 40 hours, 6,000+ recordings of 25,000+ sentences by 70+ English speakers (see db link for details). | continuous emotion dimensions characterized using valence, arousal, and trustworthiness. | Audio, Video, Text | 15 GB | English | The Multimodal Sentiment Analysis in Car Reviews (MuSe-CaR) Dataset: Collection, Insights and Improvements | Restricted access | Available under an Academic License & Commercial License |
MSP-Podcast corpus | 2020 | 100 hours by over 100 speakers (see db link for details). | This corpus is annotated with emotional labels using attribute-based descriptors (activation, dominance and valence) and categorical labels (anger, happiness, sadness, disgust, surprised, fear, contempt, neutral and other). | Audio | -- | -- | The MSP-Conversation Corpus | Restricted access | Available under an Academic License & Commercial License |
emotiontts_open_db | 2020 | Recordings and their associated transcriptions by a diverse group of speakers. | 4 emotions: general, joy, anger, and sadness. | Audio, Text | -- | Korean | -- | Partial open access | CC BY-NC-SA 4.0 |
URDU-Dataset | 2020 | 400 utterances by 38 speakers (27 male and 11 female). | 4 emotions: angry, happy, neutral, and sad. | Audio | ~72.1 MB | Urdu | Cross Lingual Speech Emotion Recognition: Urdu vs. Western Languages | Open access | None specified |
BAVED | 2020 | 1935 recording by 61 speakers (45 male and 16 female). | 3 levels of emotion. | Audio | ~195 MB | Arabic | -- | Open access | None specified |
VIVAE | 2020 | non-speech, 1085 audio file by ~12 speakers. | non-speech 6 emotions: achievement, anger, fear, pain, pleasure, and surprise with 3 emotional intensities (low, moderate, strong, peak). | Audio | -- | -- | -- | Restricted access | CC BY-NC-SA 4.0 |
SEWA | 2019 | more than 2000 minutes of audio-visual data of 398 people (201 male and 197 female) coming from 6 cultures. | emotions are characterized using valence and arousal. | Audio, Video | -- | Chinese, English, German, Greek, Hungarian and Serbian | SEWA DB: A Rich Database for Audio-Visual Emotion and Sentiment Research in the Wild | Restricted access | EULA |
MELD | 2019 | 1400 dialogues and 14000 utterances from Friends TV series by multiple speakers. | 7 emotions: Anger, disgust, sadness, joy, neutral, surprise and fear. MELD also has sentiment (positive, negative and neutral) annotation for each utterance. | Audio, Video, Text | ~10.1 GB | English | MELD: A Multimodal Multi-Party Dataset for Emotion Recognition in Conversations | Open access | GPL-3.0 License |
ShEMO | 2019 | 3000 semi-natural utterances, equivalent to 3 hours and 25 minutes of speech data from online radio plays by 87 native-Persian speakers. | 6 emotions: anger, fear, happiness, sadness, neutral and surprise. | Audio | ~1014 MB | Persian | ShEMO: a large-scale validated database for Persian speech emotion detection | Open access | None sepcified |
DEMoS | 2019 | 9365 emotional and 332 neutral samples produced by 68 native speakers (23 females, 45 males). | 7/6 emotions: anger, sadness, happiness, fear, surprise, disgust, and the secondary emotion guilt. | Audio | -- | Italian | DEMoS: An Italian emotional speech corpus. Elicitation methods, machine learning, and perception | Restricted access | EULA: End User License Agreement |
AESDD | 2018 | around 500 utterances by a diverse group of actors (over 5 actors) simlating various emotions. | 5 emotions: anger, disgust, fear, happiness, and sadness. | Audio | ~392 MB | Greek | Speech Emotion Recognition for Performance Interaction | Open access | None specified |
Emov-DB | 2018 | Recordings for 4 speakers- 2 males and 2 females. | The emotional styles are neutral, sleepiness, anger, disgust and amused. | Audio | 5.88 GB | English | The emotional voices database: Towards controlling the emotion dimension in voice generation systems | Open access | None specified |
RAVDESS | 2018 | 7356 recordings by 24 actors. | 7 emotions: calm, happy, sad, angry, fearful, surprise, and disgust | Audio, Video | ~24.8 GB | English | The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in North American English | Open access | CC BY-NC-SA 4.0 |
JL corpus | 2018 | 2400 recording of 240 sentences by 4 actors (2 males and 2 females). | 5 primary emotions: angry, sad, neutral, happy, excited. 5 secondary emotions: anxious, apologetic, pensive, worried, enthusiastic. | Audio | -- | English | An Open Source Emotional Speech Corpus for Human Robot Interaction Applications | Open access | CC0 1.0 |
CaFE | 2018 | 6 different sentences by 12 speakers (6 fmelaes + 6 males). | 7 emotions: happy, sad, angry, fearful, surprise, disgust and neutral. Each emotion is acted in 2 different intensities. | Audio | ~2 GB | French (Canadian) | -- | Open access | CC BY-NC-SA 4.0 |
EmoFilm | 2018 | 1115 audio instances sentences extracted from various films. | 5 emotions: anger, contempt, happiness, fear, and sadness. | Audio | -- | English, Italian & Spanish | Categorical vs Dimensional Perception of Italian Emotional Speech | Restricted access | EULA:End User License Agreement |
ANAD | 2018 | 1384 recording by multiple speakers. | 3 emotions: angry, happy, surprised. | Audio | ~2 GB | Arabic | Arabic Natural Audio Dataset | Open access | CC BY-NC-SA 4.0 |
EmoSynth | 2018 | 144 audio file labelled by 40 listeners. | Emotion (no speech) defined in regard of valence and arousal. | Audio | 103.4 MB | -- | The Perceived Emotion of Isolated Synthetic Audio: The EmoSynth Dataset and Results | Open access | Creative Commons Attribution 4.0 International |
CMU-MOSEI | 2018 | 65 hours of annotated video from more than 1000 speakers and 250 topics. | 6 Emotion (happiness, sadness, anger,fear, disgust, surprise) + Likert scale. | Audio, Video | -- | English | Multi-attention Recurrent Network for Human Communication Comprehension | Open access | License |
CMU-MOSI | 2017 | 2199 opinion utterances with annotated sentiment. | Sentiment annotated between very negative to very positive in seven Likert steps. | Audio, Video | -- | English | Multi-attention Recurrent Network for Human Communication Comprehension | Open access | License |
MSP-IMPROV | 2017 | 20 sentences by 12 actors. | 4 emotions: angry, sad, happy, neutral, other, without agreement | Audio, Video | -- | English | MSP-IMPROV: An Acted Corpus of Dyadic Interactions to Study Emotion Perception | Restricted access | Available under an Academic License & Commercial License |
CREMA-D | 2017 | 7442 clip of 12 sentences spoken by 91 actors (48 males and 43 females). | 6 emotions: angry, disgusted, fearful, happy, neutral, and sad | Audio, Video | -- | English | CREMA-D: Crowd-sourced Emotional Multimodal Actors Dataset | Open access | Available under the Open Database License & Database Content License |
Example emotion videos used in investigation of emotion perception in schizophrenia. | 2017 | 6 videos:Two example videos from each emotion category (angry, happy and neutral) by one female speaker. | 3 emotions: angry, happy and neutral. | Audio, Video | ~63 MB | English | -- | Open access | Available under the Permitted Non-commercial Re-use with Acknowledgement |
EMOVO | 2014 | 6 actors who played 14 sentences. | 6 emotions: disgust, fear, anger, joy, surprise, sadness. | Audio | ~355 MB | Italian | EMOVO Corpus: an Italian Emotional Speech Database | Open access | None specified |
RECOLA | 2013 | 3.8 hours of recordings by 46 participants. | negative and positive sentiment (valence and arousal). | Audio, Video | -- | -- | Introducing the RECOLA Multimodal Corpus of Remote Collaborative and Affective Interactions | Restricted access | Available under an Academic License & Commercial License |
GEMEP corpus | 2012 | Videos10 actors portraying 10 states. | 12 emotions: amusement, anxiety, cold anger (irritation), despair, hot anger (rage), fear (panic), interest, joy (elation), pleasure(sensory), pride, relief, and sadness. Plus, 5 additional emotions: admiration, contempt, disgust, surprise, and tenderness. | Audio, Video | -- | French | Introducing the Geneva Multimodal Expression Corpus for Experimental Research on Emotion Perception | Restricted access | None specified |
OGVC | 2012 | 9114 spontaneous utterances and 2656 acted utterances by 4 professional actors (two male and two female). | 9 emotional states: fear, surprise, sadness, disgust, anger, anticipation, joy, acceptance and the neutral state. | Audio | -- | Japanese | Naturalistic emotional speech collectionparadigm with online game and its psychological and acoustical assessment | Restricted access | None specified |
LEGO corpus | 2012 | 347 dialogs with 9,083 system-user exchanges. | Emotions classified as garbage, non-angry, slightly angry and very angry. | Audio | 1.1 GB | -- | A Parameterized and Annotated Spoken Dialog Corpus of the CMU Let’s Go Bus Information System | Open access | Lincense available with the data. Free of charges for research purposes only. |
SEMAINE | 2012 | 95 dyadic conversations from 21 subjects. Each subject converses with another playing one of four characters with emotions. | 5 FeelTrace annotations: activation, valence, dominance, power, intensity | Audio, Video, Text | 104 GB | English | The SEMAINE Database: Annotated Multimodal Records of Emotionally Colored Conversations between a Person and a Limited Agent | Restricted access | Academic EULA |
SAVEE | 2011 | 480 British English utterances by 4 males actors. | 7 emotions: anger, disgust, fear, happiness, sadness, surprise and neutral. | Audio, Video | -- | English (British) | Multimodal Emotion Recognition | Restrictted access | Free of charges for research purposes only. |
TESS | 2010 | 2800 recording by 2 actresses. | 7 emotions: anger, disgust, fear, happiness, pleasant surprise, sadness, and neutral. | Audio | -- | English | BEHAVIOURAL FINDINGS FROM THE TORONTO EMOTIONAL SPEECH SET | Open access | CC BY-NC-ND 4.0 |
EEKK | 2007 | 26 text passage read by 10 speakers. | 4 main emotions: joy, sadness, anger and neutral. | -- | ~352 MB | Estonian | Estonian Emotional Speech Corpus | Open access | CC-BY license |
IEMOCAP | 2007 | 12 hours of audiovisual data by 10 actors. | 5 emotions: happiness, anger, sadness, frustration and neutral. | -- | -- | English | IEMOCAP: Interactive emotional dyadic motion capture database | Restricted access | License |
Keio-ESD | 2006 | A set of human speech with vocal emotion spoken by a Japanese male speaker. | 47 emotions including angry, joyful, disgusting, downgrading, funny, worried, gentle, relief, indignation, shameful, etc. | Audio | -- | Japanese | EMOTIONAL SPEECH SYNTHESIS USING SUBSPACE CONSTRAINTS IN PROSODY | Restricted access | Available for research purposes only |
EMO-DB | 2005 | 800 recording spoken by 10 actors (5 males and 5 females). | 7 emotions: anger, neutral, fear, boredom, happiness, sadness, disgust. | Audio | -- | German | A Database of German Emotional Speech | Open access | None specified |
eNTERFACE05 | 2005 | Videos by 42 subjects, coming from 14 different nationalities. | 6 emotions: anger, fear, surprise, happiness, sadness and disgust. | Audio, Video | ~0.8 GB | German | The eNTERFACE’05 Audio-Visual Emotion Database | Open access | Free of charges for research purposes only |
DES | 2002 | 4 speakers (2 males and 2 females). | 5 emotions: neutral, surprise, happiness, sadness and anger | -- | -- | Danish | Documentation of the Danish Emotional Speech Database |
- Swain, Monorama & Routray, Aurobinda & Kabisatpathy, Prithviraj, Databases, features and classifiers for speech emotion recognition: a review, International Journal of Speech Technology, paper
- Dimitrios Ververidis and Constantine Kotropoulos, A State of the Art Review on Emotional Speech Databases, Artificial Intelligence & Information Analysis Laboratory, Department of Informatics Aristotle, University of Thessaloniki, paper
- A. Pramod Reddy and V. Vijayarajan, Extraction of Emotions from Speech-A Survey, VIT University, International Journal of Applied Engineering Research, paper
- Emotional Speech Databases, document
- Expressive Synthetic Speech, website
- Untitled, Technical university Munich, document
-
All contributions are welcome! If you know a dataset that belongs here (see criteria) but is not listed, please feel free to add it. For more information on Contributing, please refer to CONTRIBUTING.md.
-
If you notice a typo or a mistake, please report this as an issue and help us improve the quality of this list.
- The mainter and the contributors try their best to keep this list up-to-date, and to only include working links (using automated verification with the help of the urlchecker-action). However, we cannot guarantee that all listed links are up-to-date. Read more in DISCLAIMER.md.