/MER-Databases-and-Emotion-Ambiguity

The most popular databases used in multimodal emotion recognition with a focus on the representation of emotion ambiguity.

SOTA - Multimodal Emotion Recognition Databases and Emotional Ambiguity

Main Contributions

✅ State-of-the-art of databases used in Multimodal Emotion Recognition (MER) and comprised at least the visual and vocal modalities.

✅ Focus on the representation of ambiguity in emotional models for each database. We define emotional ambiguity as the difficulty for humans to identify, express and recognize an emotion with certainty.

README Structure

  1. Emotional Models: Brief summary of the discrete and continuous emotion representation
  2. Databases with Discrete Emotions: General description and position with respect to the representation of emotional ambiguity
  3. Databases with Continuous Emotions: General description and position with respect to the representation of emotional ambiguity

Research Article

The full article describes the main emotional models and proposes a literature review of the most used trimodal databases (images, voice, text), with a study of emotional ambiguity representation.

Hélène Tran, Lisa Brelet, Issam Falih, Xavier Goblet, & Engelbert Mephu Nguifo (2022). L'ambiguïté dans la représentation des émotions : état de l'art des bases de données multimodales. Revue des Nouvelles Technologies de l'Information, Extraction et Gestion des Connaissances, RNTI-E-38, 87-98. https://editions-rnti.fr/?inprocid=1002719

Bonus! Our 3-minute presentation video (:fr:) is here: https://www.youtube.com/watch?v=pDQ1hpQJF8Q&list=PLApKApkHPMWHARfMnXMfYRyskRIv_huxM&index=10

Interested in my work?

Feel free to contact us at: helene.tran@doctorant.uca.fr

Please cite the following paper if our work is useful for your research:

@article{RNTI/papers/1002719,
  author    = {Hélène Tran and Lisa Brelet and Issam Falih and Xavier Goblet and Engelbert Mephu Nguifo},
  title     = {L'ambiguïté dans la représentation des émotions : état de l'art des bases de données multimodales},
  journal = {Revue des Nouvelles Technologies de l'Information},
  volume = {Extraction et Gestion des Connaissances, RNTI-E-38},
  year      = {2022},
  pages     = {87-98}
}

Emotional Models

🔗 Anchor Links:

  1. Discrete Model: Description of discrete emotions and the main representations
  2. Continuous Model: Description of continuous emotions and the main representations

Discrete Model

Emotions are represented by discrete affective states. This approach is the most natural way for humans to define their own emotions. According to the evolutionary perspective, there are two main categories of emotions:

  • Primary emotions (or basic emotions): they are innate and universal emotions, short-lived and quickly triggered in response to an emotional stimulus. The list of primary emotions varies greatly depending on the discipline and the author. The most common ones are:

    • The six primary emotions of Ekman (1992): fear, anger, joy, sadness, disgust and surprise.

    • Plutchik (2001)'s Wheel of Emotions with eight primary emotions grouped into four pairs of opposing emotions: joy and sadness, confidence and disgust, fear and anger, anticipation and surprise.

    Plutchik's Wheel of Emotions

  • Secondary emotions (or complex emotions): they are derived from the feelings of the primary emotions, are acquired through learning and confrontation with reality, and are culture-dependent. As an example, after feeling anger, we may then feel shame or guilt. Secondary emotions are usually combinations of primary emotions (Nugier et al., 2009).

Primary emotions are frequently chosen as classes for the development of recognition models, thus avoiding a possible overlap of emotional states. Although the discrete approach remains widely favoured in research due to its intelligibility, limiting oneself to the recognition of a primary emotion is not sufficient to describe the whole spectrum of emotions.

Go to top

Continuous Model

Emotions are placed in a multidimensional space. A large majority of researchers working on the continuous model agree on the two following dimensions:

  • Valence corresponds to the pleasantness of the emotion. For example, disgust is an unpleasant emotion and is therefore associated with a negative valence. In contrast, a positive valence is related to a pleasant emotion such as joy.

  • Arousal (or activation) refers to the physiological intensity felt. Sadness, which leads to withdrawal, is associated with low activation. Anger activation is high because it releases an influx of energy that allows the individual to prepare for battle.

These two dimensions are not sufficient to characterize the emotion (e.g. both anger and fear have a negative valence and high arousal), hence the need of a third dimension. However, this one does not create a consensus in the research.

  • According to Russell et Mehrabian (1977), it could correspond to control. Thus, an angry individual feels in control of the situation (positive control) while a person overwhelmed by sadness seems to see the situation slipping away (negative control). The resulting model is called PAD (Pleasure, Arousal, Dominance) and is sufficient to describe all emotions according to the authors.

The Pleasure-Arousal-Dominance model (Source: Buechel et Hahn, 2016)

PAD model french français

The Pleasure-Arousal-Dominance model, french version (based on Karg et al., 2013)

  • According to Schlosberg (1954), it should be the attention-rejection dimension. Rejection is related to a strong attempt to exclude the external object, while attention is the active opposite of rejection.

In addition to better temporal resolution, the continuous approach allows for a wide range of emotional states to be represented and for variations in these states over time to be handled (Gunes et al., 2011). These advantages have led to a growing interest in affective computing.

Go to top

Databases with Discrete Emotions

🔗 Anchor Links:

  1. Description of the Databases: General information of the databases that have chosen discrete emotions
  2. Databases and Emotional Ambiguity: Position of the databases with respect to the representation of emotional ambiguity

🚩 Databases marked with a * offer discrete and continuous representation of emotions.

Description of the Databases - Discrete

Name Year Language Modalities Classes Number sentences Description
CMU-MOSEAS (paper) 2020 French, Spanish, Portuguese, German Vocal, Visual, Textual Happiness, Anger, Sadness, Fear, Disgust, Surprise 10000 per language Monologue videos from YouTube. High diversity of topics covered by a large number of speakers. Annotations on sentiments, emotions, subjectivity and 12 personality traits. Each emotion annotated on a [0,3] Likert scale for its degree of presence.
CMU-MOSEI (paper) 2018 English Vocal, Visual, Textual Happiness, Anger, Sadness, Fear, Disgust, Surprise 23453 Monologue videos from YouTube. High diversity of topics covered by a large number of speakers. Annotations on emotions and sentiments. Each emotion annotated on a [0,3] Likert scale for its degree of presence.
MELD (paper) 2018 English Vocal, Visual, Textual Happiness, Anger, Sadness, Fear, Disgust, Surprise, Neutral 13708 Clips from the TV series Friends. One of the largest databases involving more than two people in a conversation.
OMG-Emotions* (paper) 2018 English Vocal, Visual, Textual Happiness, Anger, Sadness, Fear, Disgust, Surprise, Neutral 2400 Monologue videos from YouTube. Uses gradual annotations with a focus on contextual emotion expressions.
RAVDESS (paper) 2018 English Vocal, Visual For speech: Calm, Happy, Sad, Angry, Fearful, Surprise, Disgust, Neutral ; For song: Calm, Happy, Sad, Angry, Fearful, Neutral For each modality (vocal only, visual only, audio-visual): 1440 for speech, 1012 for song Isolated sentences uttered by professional actors in studio. Designed for emotion recognition in speech and songs.
GEMEP (paper full set, paper core set) 2010 French Vocal, Visual Admiration, amusement, tenderness, anger, disgust, despair, pride, shame, anxiety, interest, irritation, joy, contempt, fear, pleasure, relief, surprise, sadness 1260 (full set) / 154 (core set) portrayals1 Professional actors playing out emotional scenarios. Emotions are chosen so that all the values of the valence-arousal pairs are represented (positive/negative valence, high/low arousal).
IEMOCAP* (paper) 2008 English Vocal, Visual, Textual, Markers on face, head and hand Happiness, Anger, Sadness, Neutral, Frustration 10039 Emotions are played out by professional actors. Widely used in affective computer research.
eNTERFACE'05 (paper) 2006 English Vocal, Visual Happiness, Anger, Sadness, Fear, Disgust, Surprise, Neutral 1166 Isolated sentences uttered by naive subjects from 14 nations. Mood induction by listening to short stories. Black background.

Go to top

Databases and Emotional Ambiguity - Discrete

Name Year Final annotation # annotations per sentence Aggregation of annotations
CMU-MOSEAS 2020 Many emotions possible per sentence 3 -
CMU-MOSEI 2018 Many emotions possible per sentence 3 -
MELD 2018 One emotion per sentence 3 Majority vote
OMG-Emotions* 2018 One emotion per sentence 5 Majority vote
RAVDESS 2018 One emotion per sentence with 2 intensities (normal, strong) 10 -
GEMEP 2010 One or two emotion(s) per portrayal with 4 intensities (full set) / One emotion per portrayal with 5 intensities (core set) 23 (audio-video), 23 (audio only), 25 (video only) -
IEMOCAP* 2008 One emotion per sentence 3 Majority vote
eNTERFACE'05 2006 One emotion per sentence No annotation -

Go to top

Databases with Continuous Emotions

🔗 Anchor Links:

  1. Description of the Databases: General information of the databases that have chosen continuous emotions
  2. Databases and Emotional Ambiguity: Position of the databases with respect to the representation of emotional ambiguity

🚩 Databases marked with a * offer discrete and continuous representation of emotions.

Description of the Databases - Continuous

Name Year Language Modalities Dimensions Number sentences Description
MuSe-CaR (paper) 2021 English Vocal, Visual, Textual Valence, Arousal, Trustworthiness 28295 Car reviews from YouTube. In-the-wild characteristics (e.g. reviewer visibility, ambient noises, domain-specific terms). Designed for sentiment recognition, emotion-target engagement and trustworthiness detection.
SEWA (paper) 2019 English, German, Hungarian, Greek, Serbian, Chinese Vocal, Visual, Textual Valence, Arousal, Liking 5382 Ordinary people from the same culture discuss advertisements via video conference. Database created for emotion recognition but also for human behavior analysis, including cultural studies.
OMG-Emotions* (paper) 2018 English Vocal, Visual, Textual Valence, Arousal 2400 Monologue videos from YouTube. Uses gradual annotations with a focus on contextual emotion expressions.
RECOLA (paper) 2013 French Vocal, Visual, ECG3, EDA4 Valence, Arousal 3,8 hours1 Online dyadic interactions where participants need to collaborate to solve a survival task. Mood Induction Procedure to elicit emotion.
SEMAINE (paper) 2012 English Vocal, Visual, Textual Valence, Arousal, Expectation, Power, Intensity 190 videos1 Volunteers interact with an artificial character to which a personality trait has been assigned (angry, happy, gloomy, sensible).
IEMOCAP* (paper) 2008 English Vocal, Visual, Textual, Markers on face, head and hand Valence, Arousal, Dominance 10039 Emotions are played out by professional actors. Widely used in affective computer research.
SAL (paper) 2008 English Vocal, Visual Valence, Arousal 1692 turns1 Natural human-SAL conversations involving artificial characters with different emotional personalities (angry, sad, gloomy, sensitive).
VAM (paper) 2008 German Vocal, Visual Valence, Arousal, Dominance 1018 Videos from a German TV talk show: spontaneous emotions from unscripted discussion. Audio and face annotated separately.

Go to top

Databases and Emotional Ambiguity - Continuous

Name Year Final annotation # annotations per sentence Aggregation of annotations
MuSe-CaR 2021 Point in the emotional space 5 Evaluator Weighted Estimator
SEWA 2019 Point in the emotional space At least 3 Canonical Time Warping
OMG-Emotions* 2018 Point in the emotional space 5 Evaluator Weighted Estimator
RECOLA 2013 Point in the emotional space 6 Mean Filtering
SEMAINE 2012 Point in the emotional space 3 - 8 -
IEMOCAP* 2008 Single element from SAM5 2 Z-normalisation
SAL 2008 Point in the emotional space 4 Z-normalisation
VAM 2008 Single element from SAM5 6 - 17 (audio) ; 8 - 34 (face) -

Go to top

References

Buechel, S. et U. Hahn (2016). Emotion Analysis as a Regression Problem — Dimensional Models and Their Implications on Emotion Representation and Metrical Evaluation. In ECAI 2016, pp.1114–1122. IOS Press. https://doi.org/10.3233/978-1-61499-672-9-1114

Ekman, P. (1992). An Argument for Basic Emotions. Cognition & Emotion 6(3-4), 169–200. https://doi.org/10.1080/02699939208411068

Gunes, H., B. Schuller, M. Pantic, et R. Cowie (2011). Emotion representation, analysis and synthesis in continuous space : A survey. In 2011 IEEE International Conference on Automatic Face & Gesture Recognition (FG), pp. 827–834. IEEE. https://doi.org/10.1109/FG.2011.5771357

Karg, M., Samadani, A. A., Gorbet, R., Kühnlenz, K., Hoey, J., & Kulić, D. (2013). Body movements for affective expression: A survey of automatic recognition and generation. IEEE Transactions on Affective Computing, 4(4), 341-359.

Nugier, A. (2009). Histoire et grands courants de recherche sur les émotions. Revue électronique de Psychologie Sociale 4(4), 8–14. https://doi.org/10.1109/FG.2011.5771357

Plutchik, R. (2001). The Nature of Emotions : Human emotions have deep evolutionary roots, a fact that may explain their complexity and provide tools for clinical practice. American Scientist 89(4), 344–350. https://www.jstor.org/stable/27857503

Russell, J. A. et A. Mehrabian (1977). Evidence for a Three-Factor Theory of Emotions. Journal of Research in Personality 11(3), 273–294. https://doi.org/10.1016/0092-6566(77)90037-X

Schlosberg, H. (1954). Three dimensions of emotion. Psychological Review, 61(2), 81–88. https://doi.org/10.1037/h0054570

Footnotes

  1. No information provided on the number of annotated sentences. 2 3 4

  2. The paper does not specify whether it is all languages combined or a specific language.

  3. For Electrocardiogram

  4. For Electrodermal Activity

  5. SAM for Self-Assessment Manikins 2