Spoken Emotion Recognition Datasets: A collection of datasets (count=43) for the purpose of emotion recognition/detection in speech. The table is chronologically ordered and includes a description of the content of each dataset along with the emotions included. The table can be browsed, sorted and searched under https://superkogito.github.io/SER-datasets/
Dataset | Year | Content | Emotions | Format | Size | Language | Paper | Access | License |
---|---|---|---|---|---|---|---|---|---|
MESD | 2022 | 864 audio files of single-word emotional utterances with Mexican cultural shaping. | 6 emotions provides single-word utterances for anger, disgust, fear, happiness, neutral, and sadness. | Audio | 0,097 GB | Spanish (Mexican) | The Mexican Emotional Speech Database (MESD): elaboration and assessment based on machine learning | Open | CC BY 4.0 |
SyntAct | 2022 | Synthesized database of three basic emotions and neutral expression based on rule-based manipulation for a diphone synthesizer which we release to the public | 997 utterances including 6 emotions: angry, bored, happy, neutral, sad and scared | Audio | 941 MB | German | SyntAct: A Synthesized Database of Basic Emotions | Open | CC BY-SA 4.0 |
MLEnd | 2021 | ~32700 audio recordings files produced by 154 speakers. Each audio recording corresponds to one English numeral (from "zero" to "billion") | Intonations: neutral, bored, excited and question | Audio | 2.27 GB | -- | -- | Open | Unknown |
ASVP-ESD | 2021 | ~13285 audio files collected from movies, tv shows and youtube containing speech and non-speech. | 12 different natural emotions (boredom, neutral, happiness, sadness, anger, fear, surprise, disgust, excitement, pleasure, pain, disappointment) with 2 levels of intensity. | Audio | 2 GB | Chinese, English, French, Russian and others | -- | Open | Unknown |
ESD | 2021 | 29 hours, 3500 sentences, by 10 native English speakers and 10 native Chinese speakers. | 5 emotions: angry, happy, neutral, sad, and surprise. | Audio, Text | 2.4 GB (zip) | Chinese, English | Seen And Unseen Emotional Style Transfer For Voice Conversion With A New Emotional Speech Dataset | Open | Academic License |
MuSe-CAR | 2021 | 40 hours, 6,000+ recordings of 25,000+ sentences by 70+ English speakers (see db link for details). | continuous emotion dimensions characterized using valence, arousal, and trustworthiness. | Audio, Video, Text | 15 GB | English | The Multimodal Sentiment Analysis in Car Reviews (MuSe-CaR) Dataset: Collection, Insights and Improvements | Restricted | Academic License & Commercial License |
MSP-Podcast corpus | 2020 | 100 hours by over 100 speakers (see db link for details). | This corpus is annotated with emotional labels using attribute-based descriptors (activation, dominance and valence) and categorical labels (anger, happiness, sadness, disgust, surprised, fear, contempt, neutral and other). | Audio | -- | -- | The MSP-Conversation Corpus | Restricted | Academic License & Commercial License |
emotiontts open db | 2020 | Recordings and their associated transcriptions by a diverse group of speakers. | 4 emotions: general, joy, anger, and sadness. | Audio, Text | -- | Korean | -- | Partially open | CC BY-NC-SA 4.0 |
URDU-Dataset | 2020 | 400 utterances by 38 speakers (27 male and 11 female). | 4 emotions: angry, happy, neutral, and sad. | Audio | 0.072 GB | Urdu | Cross Lingual Speech Emotion Recognition: Urdu vs. Western Languages | Open | -- |
BAVED | 2020 | 1935 recording by 61 speakers (45 male and 16 female). | 3 levels of emotion. | Audio | 0.195 GB | Arabic | -- | Open | -- |
VIVAE | 2020 | non-speech, 1085 audio file by 12 speakers. | non-speech 6 emotions: achievement, anger, fear, pain, pleasure, and surprise with 3 emotional intensities (low, moderate, strong, peak). | Audio | -- | -- | -- | Restricted | CC BY-NC-SA 4.0 |
SEWA | 2019 | more than 2000 minutes of audio-visual data of 398 people (201 male and 197 female) coming from 6 cultures. | emotions are characterized using valence and arousal. | Audio, Video | -- | Chinese, English, German, Greek, Hungarian and Serbian | SEWA DB: A Rich Database for Audio-Visual Emotion and Sentiment Research in the Wild | Restricted | SEWA EULA |
MELD | 2019 | 1400 dialogues and 14000 utterances from Friends TV series by multiple speakers. | 7 emotions: Anger, disgust, sadness, joy, neutral, surprise and fear. MELD also has sentiment (positive, negative and neutral) annotation for each utterance. | Audio, Video, Text | 10.1 GB | English | MELD: A Multimodal Multi-Party Dataset for Emotion Recognition in Conversations | Open | MELD: GPL-3.0 License |
ShEMO | 2019 | 3000 semi-natural utterances, equivalent to 3 hours and 25 minutes of speech data from online radio plays by 87 native-Persian speakers. | 6 emotions: anger, fear, happiness, sadness, neutral and surprise. | Audio | 0.101 GB | Persian | ShEMO: a large-scale validated database for Persian speech emotion detection | Open | -- |
DEMoS | 2019 | 9365 emotional and 332 neutral samples produced by 68 native speakers (23 females, 45 males). | 7/6 emotions: anger, sadness, happiness, fear, surprise, disgust, and the secondary emotion guilt. | Audio | -- | Italian | DEMoS: An Italian emotional speech corpus. Elicitation methods, machine learning, and perception | Restricted | EULA: End User License Agreement |
AESDD | 2018 | around 500 utterances by a diverse group of actors (over 5 actors) siumlating various emotions. | 5 emotions: anger, disgust, fear, happiness, and sadness. | Audio | 0.392 GB | Greek | Speech Emotion Recognition for Performance Interaction | Open | -- |
Emov-DB | 2018 | Recordings for 4 speakers- 2 males and 2 females. | The emotional styles are neutral, sleepiness, anger, disgust and amused. | Audio | 5.88 GB | English | The emotional voices database: Towards controlling the emotion dimension in voice generation systems | Open | -- |
RAVDESS | 2018 | 7356 recordings by 24 actors. | 7 emotions: calm, happy, sad, angry, fearful, surprise, and disgust | Audio, Video | 24.8 GB | English | The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in North American English | Open | CC BY-NC-SA 4.0 |
JL corpus | 2018 | 2400 recording of 240 sentences by 4 actors (2 males and 2 females). | 5 primary emotions: angry, sad, neutral, happy, excited. 5 secondary emotions: anxious, apologetic, pensive, worried, enthusiastic. | Audio | -- | English | An Open Source Emotional Speech Corpus for Human Robot Interaction Applications | Open | CC0 1.0 |
CaFE | 2018 | 6 different sentences by 12 speakers (6 fmelaes + 6 males). | 7 emotions: happy, sad, angry, fearful, surprise, disgust and neutral. Each emotion is acted in 2 different intensities. | Audio | 2 GB | French (Canadian) | -- | Open | CC BY-NC-SA 4.0 |
EmoFilm | 2018 | 1115 audio instances sentences extracted from various films. | 5 emotions: anger, contempt, happiness, fear, and sadness. | Audio | -- | English, Italian & Spanish | Categorical vs Dimensional Perception of Italian Emotional Speech | Restricted | EULA: End User License Agreement |
ANAD | 2018 | 1384 recording by multiple speakers. | 3 emotions: angry, happy, surprised. | Audio | 2 GB | Arabic | Arabic Natural Audio Dataset | Open | CC BY-NC-SA 4.0 |
EmoSynth | 2018 | 144 audio file labelled by 40 listeners. | Emotion (no speech) defined in regard of valence and arousal. | Audio | 0.1034 GB | -- | The Perceived Emotion of Isolated Synthetic Audio: The EmoSynth Dataset and Results | Open | CC BY 4.0 |
CMU-MOSEI | 2018 | 65 hours of annotated video from more than 1000 speakers and 250 topics. | 6 Emotion (happiness, sadness, anger,fear, disgust, surprise) + Likert scale. | Audio, Video | -- | English | Multi-attention Recurrent Network for Human Communication Comprehension | Open | CMU-MOSEI License |
VERBO | 2018 | 14 different phrases by 12 speakers (6 female + 6 male) for a total of 1167 recordings. | 7 emotions: Happiness, Disgust, Fear, Neutral, Anger, Surprise, Sadness | Audio | -- | Portuguese | VERBO: Voice Emotion Recognition dataBase in Portuguese Language | Restricted | Available for research purposes only |
CMU-MOSI | 2017 | 2199 opinion utterances with annotated sentiment. | Sentiment annotated between very negative to very positive in seven Likert steps. | Audio, Video | -- | English | Multi-attention Recurrent Network for Human Communication Comprehension | Open | CMU-MOSI License |
MSP-IMPROV | 2017 | 20 sentences by 12 actors. | 4 emotions: angry, sad, happy, neutral, other, without agreement | Audio, Video | -- | English | MSP-IMPROV: An Acted Corpus of Dyadic Interactions to Study Emotion Perception | Restricted | Academic License & Commercial License |
CREMA-D | 2017 | 7442 clip of 12 sentences spoken by 91 actors (48 males and 43 females). | 6 emotions: angry, disgusted, fearful, happy, neutral, and sad | Audio, Video | -- | English | CREMA-D: Crowd-sourced Emotional Multimodal Actors Dataset | Open | Open Database License & Database Content License |
Example emotion videos used in investigation of emotion perception in schizophrenia | 2017 | 6 videos:Two example videos from each emotion category (angry, happy and neutral) by one female speaker. | 3 emotions: angry, happy and neutral. | Audio, Video | 0.063 GB | English | -- | Open | Permitted Non-commercial Re-use with Acknowledgment |
EMOVO | 2014 | 6 actors who played 14 sentences. | 6 emotions: disgust, fear, anger, joy, surprise, sadness. | Audio | 0.355 GB | Italian | EMOVO Corpus: an Italian Emotional Speech Database | Open | -- |
RECOLA | 2013 | 3.8 hours of recordings by 46 participants. | negative and positive sentiment (valence and arousal). | Audio, Video | -- | -- | Introducing the RECOLA Multimodal Corpus of Remote Collaborative and Affective Interactions | Restricted | Academic License & Commercial License |
GEMEP corpus | 2012 | Videos10 actors portraying 10 states. | 12 emotions: amusement, anxiety, cold anger (irritation), despair, hot anger (rage), fear (panic), interest, joy (elation), pleasure(sensory), pride, relief, and sadness. Plus, 5 additional emotions: admiration, contempt, disgust, surprise, and tenderness. | Audio, Video | -- | French | Introducing the Geneva Multimodal Expression Corpus for Experimental Research on Emotion Perception | Restricted | -- |
OGVC | 2012 | 9114 spontaneous utterances and 2656 acted utterances by 4 professional actors (two male and two female). | 9 emotional states: fear, surprise, sadness, disgust, anger, anticipation, joy, acceptance and the neutral state. | Audio | -- | Japanese | Naturalistic emotional speech collectionparadigm with online game and its psychological and acoustical assessment | Restricted | -- |
LEGO corpus | 2012 | 347 dialogs with 9,083 system-user exchanges. | Emotions classified as garbage, non-angry, slightly angry and very angry. | Audio | 1.1 GB | -- | A Parameterized and Annotated Spoken Dialog Corpus of the CMU Let’s Go Bus Information System | Open | License available with the data. Free of charges for research purposes only. |
SEMAINE | 2012 | 95 dyadic conversations from 21 subjects. Each subject converses with another playing one of four characters with emotions. | 5 FeelTrace annotations: activation, valence, dominance, power, intensity | Audio, Video, Text | 104 GB | English | The SEMAINE Database: Annotated Multimodal Records of Emotionally Colored Conversations between a Person and a Limited Agent | Restricted | Academic EULA |
SAVEE | 2011 | 480 British English utterances by 4 males actors. | 7 emotions: anger, disgust, fear, happiness, sadness, surprise and neutral. | Audio, Video | -- | English (British) | Multimodal Emotion Recognition | Restricted | Free of charges for research purposes only. |
TESS | 2010 | 2800 recording by 2 actresses. | 7 emotions: anger, disgust, fear, happiness, pleasant surprise, sadness, and neutral. | Audio | -- | English | BEHAVIOURAL FINDINGS FROM THE TORONTO EMOTIONAL SPEECH SET | Open | CC BY-NC-ND 4.0 |
EEKK | 2007 | 26 text passage read by 10 speakers. | 4 main emotions: joy, sadness, anger and neutral. | -- | 0.352 GB | Estonian | Estonian Emotional Speech Corpus | Open | CC-BY license |
IEMOCAP | 2007 | 12 hours of audiovisual data by 10 actors. | 5 emotions: happiness, anger, sadness, frustration and neutral. | -- | -- | English | IEMOCAP: Interactive emotional dyadic motion capture database | Restricted | IEMOCAP license |
Keio-ESD | 2006 | A set of human speech with vocal emotion spoken by a Japanese male speaker. | 47 emotions including angry, joyful, disgusting, downgrading, funny, worried, gentle, relief, indignation, shameful, etc. | Audio | -- | Japanese | EMOTIONAL SPEECH SYNTHESIS USING SUBSPACE CONSTRAINTS IN PROSODY | Restricted | Available for research purposes only. |
EMO-DB | 2005 | 800 recording spoken by 10 actors (5 males and 5 females). | 7 emotions: anger, neutral, fear, boredom, happiness, sadness, disgust. | Audio | -- | German | A Database of German Emotional Speech | Open | -- |
eNTERFACE05 | 2005 | Videos by 42 subjects, coming from 14 different nationalities. | 6 emotions: anger, fear, surprise, happiness, sadness and disgust. | Audio, Video | 0.8 GB | German | -- | Open | Free of charges for research purposes only. |
DES | 2002 | 4 speakers (2 males and 2 females). | 5 emotions: neutral, surprise, happiness, sadness and anger | -- | -- | Danish | Documentation of the Danish Emotional Speech Database | -- | -- |
- Swain, Monorama & Routray, Aurobinda & Kabisatpathy, Prithviraj, Databases, features and classifiers for speech emotion recognition: a review, International Journal of Speech Technology, paper
- Dimitrios Ververidis and Constantine Kotropoulos, A State of the Art Review on Emotional Speech Databases, Artificial Intelligence & Information Analysis Laboratory, Department of Informatics Aristotle, University of Thessaloniki, paper
- A. Pramod Reddy and V. Vijayarajan, Extraction of Emotions from Speech-A Survey, VIT University, International Journal of Applied Engineering Research, paper
- Emotional Speech Databases, document
- Expressive Synthetic Speech, website
- Towards a standard set of acoustic features for the processing of emotion in speech, Technical university Munich, document
-
All contributions are welcome! If you know a dataset that belongs here (see criteria) but is not listed, please feel free to add it. For more information on Contributing, please refer to CONTRIBUTING.md.
-
If you notice a typo or a mistake, please report this as an issue and help us improve the quality of this list.
- The mainter and the contributors try their best to keep this list up-to-date, and to only include working links (using automated verification with the help of the urlchecker-action). However, we cannot guarantee that all listed links are up-to-date. Read more in DISCLAIMER.md.