MER and SER Data sets

Summary

Music Emotion Recognition Data sets

Dataset	Year	Content	Format	Size	Annotation Type
4Q audio emotion	2018	900 excerpts (30s long)	MP3	291MB	Categorical (Quadrants)
DEAM/Mediaeval	2018	2058 excerpts (45s long)	MP3	1.4GB	Dimensional (Time-continuous A-V)
Soundtracks	2011	360+110 exceprts (15s-1m long)	MP3	216MB	Categorical (tension, anger, fear, happy, sad, tender) and Dimensional (valence, energy, tension)
MoodsMIREX	2008	269 excerpts (30s long)	MP3	736MB	Categorical (5 mood clusters)
NTWICM	2010	2648 full songs	MP3	11.7GB	"Discrete" Dimensional
CH818	2017	818 excerpts (30s long)	MP3	393MB	Dimensional
Emotify	?	?	MP3	?	Categorical (adjectives)

Speech Emotion Recognition Data sets

Dataset	Year	Content	Format	Size	Annotation Type	Language
RECOLA/AVEC2015	2013	9.5 hours of multimodal recordings	WAV,MP4,CSV	3.6GB	Dimensional (Time-continuous A-V)	French
SEMAINE-SolidSAL	2012	7.5 hours of speech recordings	WAV,AVI	38.7GB	Dimensional (Time-continuous A-V)	English
SEWA	2019	44 hours of multimodal recordings	?	?	Dimensional (Time-continuous A-V)	British, German, Hungarian, Greek, Serbian, and Mandarin
MMDBESP	2019	xx hours of video recordings	MP4	10.6GB	Categorical (Fear, Surprise, Anger, Sadness, Happiness, Disgust)	Polish

Additional information

Music Emotion Recognition Data sets

4Q Emotion Dataset

Author: Panda R., Malheiro R., Paiva R. P.
Description and music styles: The AllMusic API served as the source of musical information, providing metadata such as artist, title, genre and emotion information, as well as 30-second audio clips for most songs. Mostly popularly consumed music.
Annotation strategy: Collected from AllMusic API, emotion tags are selected from the original AllMusic Tags by intersecting them with the Warriner’s list. Finally, a manual blind validation is conducted by subjects.
Paper: Panda R., Malheiro R., Paiva R. P. (2018). “Musical Texture and Expressivity Features for Music Emotion Recognition”. 19th International Society for Music Information Retrieval Conference – ISMIR 2018, Paris, France.
Annotation (categorical): Q1 (A+V+), Q2 (A+V-), Q3(A-V-), Q4 (A-V+)
Link: http://mir.dei.uc.pt/downloads.html
Languages: English: 724, Spanish: 24, French: 0, German: 0, Other: 16, Unknown: 136, Total (from Metadata): 899

DEAM: MediaEval Database for Emotional Analysis in Music

Author: Soleymani, M., Aljanaki, A., Yang, Y.
Description and music styles: Royalty-free music from several sources: freemusicarchive.org (FMA), jamendo.com, and the medleyDB datase. The excerpts which were annotated are available in the same package song ids between 1 and 2058. The dataset consists of 2014 development set (744 songs), 2014 evaluation set (1000 songs) and 2015 evaluation set (58 songs). Includes rock, pop, soul, blues, electronic, classical, hip-hop, international, experimental, folk, jazz, country and pop genres.
Annotation strategy: Crowdsourced MTurk with each excerpt annotated at least by 10 workers. Both arousal and valence were annotated separately. Additional static annotations were collected for the whole 45 second exceprts after dynamic annotations.
Paper: Aljanaki A., Yang Y., Soleymani M. (2017) Developing a benchmark for emotional analysis of music. PLoS ONE 12(3): e0173392. https://doi.org/10.1371/journal.pone.0173392
Annotation (dimensional): Time-continuous arousal and valence annotation (1 Hz)
Link: http://cvml.unige.ch/databases/DEAM/
Languages: English: 1531, Spanish: 13, French: 0, German: 0, Other: 47, Unknown 211, Total (from Metadata): 1801

Soundtracks

Author: Eerola, T. & Vuoskoski, J. K.
Description and music styles: This dataset we refer to as "Film soundtracks" are designed to overcome at least some of the problems mentioned above since they contain not that well known examples (although some could be identified by film aficionados).
Annotation strategy: The selection of the excerpts has been done in terms of dimensional and discrete emotion model and evaluated by a pilot study. The initial ratings were made by 12 expert musicologists for both dimensional and discrete models. These ratings were then re-tested with 116 university students.
Paper: Eerola, T. & Vuoskoski, J. K. (2011). A comparison of the discrete and dimensional models of emotion in music. Psychology of Music, 39(1), 18-49. https://doi.org/10.1177/0305735610362821
Annotation: categorical and dimensional
Link: https://osf.io/p6vkg/
Languages: None (instrumental)

Moods MIREX

Author: Hu, X., Downie, J.S., Laurier, C., Bay, M., Ehmann, A.F.
Description and music styles: Selection of the libraries of Associated Production Music (APM), “the world’s leading production music library… offering every imaginable music genre from beautiful classical music recordings to vintage rock to current indie band sounds".
Annotation strategy: The pieces were rated by 3 raters, and only a subset of agreement by 2 out of 3 is extracted.
Paper: Downie, X. H. J. S., Cyril Laurier, and M. B. A. F. Ehmann. “The 2007 MIREX audio mood classification task: Lessons learned.” Proceeding of the 9th International Conference of Music Information Retrieval (ISMIR). 2008.
Annotation (categorical): Cluster 1 (passionate, rousing, confident, boisterous, rowdy), Cluster 2 (rollicking, cheerful, fun, sweet, amiable/good natured), Cluster 3 (literate, poignant, wistful, bittersweet, autumnal, brooding), Cluster 4 (humorous, silly, campy, quirky, whimsical, witty, wry), Cluster 5 (aggressive, fiery, tense/anxious, intense, volatile, visceral)
Link: Offline (on MTB-DB)
Languages: No metadata

NTWICM: Now That's What I Call Music

Author: Schuller, B., Dorfner, J., Gerhard, R.
Description and music styles: For building up a ground truth music database the compilation “Now That’s What I Call Music!” (U. K. series, volumes 1–69, double CDs, each) is selected. It represents very well most music styles which are popular today; that ranges from Pop and Rock music over Rap, R&B to electronic dance music as Techno or House.
Annotation strategy: 4 raters gave static annotations for complete songs for arousal and valence in a discrete range of [−2,−1, 0, 1, 2].
Paper: Björn Schuller, Johannes Dorfner, Gerhard Rigoll: "Determination of Non-Prototypical Valence and Arousal in Popular Music: Features and Performances", EURASIP Journal on Audio, Speech, and Music Processing (JASMP), Special Issue on "Scalable Audio-Content Analysis", vol. 2010, Article ID 735854, 19 pages, 2010.
Annotation (dimensional): Arousal and valence [−2,−1, 0, 1, 2].
Link: Offline
Languages: English: 2491, Spanish: 5, French: 0, German: 0, Other: 6, Unknown 146, Total (from Metadata): 2648

CH818

Author: Hu, X., Yang, Y.
Description and music styles: Chinese Pop songs released in Taiwan, Hong Kong and Mainland China.
Annotation strategy: Each clip was annotated by three music experts who were born and raised in Mainland China and thus were with a Chinese cultural background. The annotation was done with an interface consisting of two sliding bars of continuous real values between [-10,10].
Paper: X. Hu and Y. Yang, "Cross-Dataset and Cross-Cultural Music Mood Prediction: A Case on Western and Chinese Pop Songs" in IEEE Transactions on Affective Computing, vol. 8, no. 02, pp. 228-240, 2017. doi: 10.1109/TAFFC.2016.2523503
Annotation (dimensional): Arousal and valence real values between [-10,10] for whole excerpt.
Link: NOT AVAILABLE (made available by Yi-Hsuan Yang).
Languages: Mandarin: 818, Total (from Metadata): 818

Speech Emotion Recognition Data sets

RECOLA

Author: Ringeval, F., Sonderegger, A., Sauer, J., Lalanne, D.
Description: Participants were recorded in dyads during a video conference while completing a task requiring collaboration. Different multimodal data, i.e., audio, video, ECG and EDA, were recorded continuously and synchronously.
Annotation strategy: 6 annotators measured emotion continuously on two dimensions: arousal and valence, as well as social behavior labels on five dimensions. The annotation is made with an open source web-based annotation tool (ANNEMO).
Paper: F. Ringeval, A. Sonderegger, J. Sauer and D. Lalanne, "Introducing the RECOLA Multimodal Corpus of Remote Collaborative and Affective Interactions", 2nd International Workshop on Emotion Representation, Analysis and Synthesis in Continuous Time and Space (EmoSPACE), in Proc. of IEEE Face & Gestures 2013, Shanghai (China), April 22-26 2013.
Annotation (dimensional): Time-continuous arousal and valence annotation (1 Hz)
Link: https://diuf.unifr.ch/main/diva/recola/download.html

SEMAINE

Author: McKeown, G., Valstar, M., Cowie, R., Pantic, M., Schroder, M.
Description: The interactions involve two parties, a “user” (who is always human) and an “operator” (either a machine or a person simulating a machine). A key objective of the Solid SAL scenario was to record behaviors (mainly nonverbal) that a human operator shows in fluent face-to-face conversation, including their relationships to user behavior—notably backchanneling, eye contact, various synchronies, and so on.
Annotation strategy: For user clips, four sessions have been annotated by eight raters, 17 by 6, and the remainder by at least three. For operator clips three have been annotated by four raters, the rest by one rater.
Paper: McKeown, G., Valstar, M., Cowie, R., Pantic, M., Schroder, M. The SEMAINE Database: Annotated Multi-modal Records of Emotionally Colored Conversations between a Person and a Limited Agent. IEEE Transactions on Affective Computing. 2012; 3:5–17. https://doi.org/10.1109/T-AFFC.2011.20
Annotation (dimensional): Time-continuous arousal and valence annotation (1 Hz)
Link: https://semaine-db.eu/

SEWA

Author: Kossaifi, J., Walecki, R., Panagakis, Y., Shen, J., Schmitt, M., Ringeval, F., Han, J. et al.
Description: Subjects were recorded in two different contexts: while watching adverts and while discussing adverts in a video chat. The database includes rich annotations of the recordings in terms of facial landmarks, facial action units (FAU), various vocalisations, mirroring, and continuously valued valence, arousal, liking, agreement, and prototypic examples of (dis)liking.
Annotation strategy:
Paper: Kossaifi, J., Walecki, R., Panagakis, Y., Shen, J., Schmitt, M., Ringeval, F., Han, J. et al. "SEWA DB: A Rich Database for Audio-Visual Emotion and Sentiment Research in the Wild." arXiv preprint arXiv:1901.02839 (2019).
Annotation: 538 short (10-30s) video-chat recording segments were manually selected to form the fully-annotated basic SEWA dataset. These segments were selected based on the subjects to the subjects emotional state of low / high valance, low / high arousal, and liking / disliking. All 6 cultures were evenly represented in the basic SEWA dataset, with approximately 90 segments selected from each culture based on the consensus of at least 3 annotators from the same culture.
Link: https://db.sewaproject.eu/

MMDBESP: Multimodal Database of Emotional Speech in Polish

Author: Sapiński T., Kamińska D., Pelikant A., Ozcinar C., Avots E., Anbarjafari G.
Description: Before the recording all actors were presented with a short scenario describing the sequence of emotions they had to present in order: neutral state, sadness, surprise, fear, disgust, anger, happiness. In addition they were asked to utter a short sentence in Polish, same for all emotional states Każdy z nas odczuwa emocje na swój sposób (English translation: Each of us perceives emotions in a different manner).
Annotation strategy: Enacted emotions - not ecologically valid.
Paper: Sapiński, T., Kamińska, D., Pelikant, A., Ozcinar, C., Avots, E., Anbarjafari, G. (2019) Multimodal Database of Emotional Speech, Video and Gestures. In: Zhang Z., Suter D., Tian Y., Branzan Albu A., Sidère N., Jair Escalante H. (eds) Pattern Recognition and Information Forensics. ICPR 2018. Lecture Notes in Computer Science, vol 11188
Annotation (categorical): Fear, Surprise, Anger, Sadness, Happiness, Disgust - basic emotions.
Link: https://competitions.codalab.org/competitions/22249

hnbrh/datasets_emotion

MER and SER Data sets

Summary

Music Emotion Recognition Data sets

Speech Emotion Recognition Data sets

Additional information

Music Emotion Recognition Data sets

4Q Emotion Dataset

DEAM: MediaEval Database for Emotional Analysis in Music

Soundtracks

Moods MIREX

NTWICM: Now That's What I Call Music

CH818

Speech Emotion Recognition Data sets

RECOLA

SEMAINE

SEWA

MMDBESP: Multimodal Database of Emotional Speech in Polish