SOTA - Multimodal Emotion Recognition Databases and Emotional Ambiguity

Main Contributions

✅ State-of-the-art of databases used in Multimodal Emotion Recognition (MER) and comprised at least the visual and vocal modalities.

✅ Focus on the representation of ambiguity in emotional models for each database. We define emotional ambiguity as the difficulty for humans to identify, express and recognize an emotion with certainty.

README Structure

Emotional Models: Brief summary of the discrete and continuous emotion representation
Databases with Discrete Emotions: General description and position with respect to the representation of emotional ambiguity
Databases with Continuous Emotions: General description and position with respect to the representation of emotional ambiguity

Research Article

The full article describes the main emotional models and proposes a literature review of the most used trimodal databases (images, voice, text), with a study of emotional ambiguity representation.

Hélène Tran, Lisa Brelet, Issam Falih, Xavier Goblet, & Engelbert Mephu Nguifo (2022). L'ambiguïté dans la représentation des émotions : état de l'art des bases de données multimodales. Revue des Nouvelles Technologies de l'Information, Extraction et Gestion des Connaissances, RNTI-E-38, 87-98. https://editions-rnti.fr/?inprocid=1002719

Bonus! Our 3-minute presentation video (:fr:) is here: https://www.youtube.com/watch?v=pDQ1hpQJF8Q&list=PLApKApkHPMWHARfMnXMfYRyskRIv_huxM&index=10

Interested in my work?

Feel free to contact us at: helene.tran@doctorant.uca.fr

Please cite the following paper if our work is useful for your research:

@article{RNTI/papers/1002719,
  author    = {Hélène Tran and Lisa Brelet and Issam Falih and Xavier Goblet and Engelbert Mephu Nguifo},
  title     = {L'ambiguïté dans la représentation des émotions : état de l'art des bases de données multimodales},
  journal = {Revue des Nouvelles Technologies de l'Information},
  volume = {Extraction et Gestion des Connaissances, RNTI-E-38},
  year      = {2022},
  pages     = {87-98}
}

Emotional Models

🔗 Anchor Links:

Discrete Model: Description of discrete emotions and the main representations
Continuous Model: Description of continuous emotions and the main representations

Discrete Model

Emotions are represented by discrete affective states. This approach is the most natural way for humans to define their own emotions. According to the evolutionary perspective, there are two main categories of emotions:

Primary emotions (or basic emotions): they are innate and universal emotions, short-lived and quickly triggered in response to an emotional stimulus. The list of primary emotions varies greatly depending on the discipline and the author. The most common ones are:
- The six primary emotions of Ekman (1992): fear, anger, joy, sadness, disgust and surprise.
- Plutchik (2001)'s Wheel of Emotions with eight primary emotions grouped into four pairs of opposing emotions: joy and sadness, confidence and disgust, fear and anger, anticipation and surprise.
Plutchik's Wheel of Emotions
Secondary emotions (or complex emotions): they are derived from the feelings of the primary emotions, are acquired through learning and confrontation with reality, and are culture-dependent. As an example, after feeling anger, we may then feel shame or guilt. Secondary emotions are usually combinations of primary emotions (Nugier et al., 2009).

Primary emotions are frequently chosen as classes for the development of recognition models, thus avoiding a possible overlap of emotional states. Although the discrete approach remains widely favoured in research due to its intelligibility, limiting oneself to the recognition of a primary emotion is not sufficient to describe the whole spectrum of emotions.

Go to top

Continuous Model

Emotions are placed in a multidimensional space. A large majority of researchers working on the continuous model agree on the two following dimensions:

Valence corresponds to the pleasantness of the emotion. For example, disgust is an unpleasant emotion and is therefore associated with a negative valence. In contrast, a positive valence is related to a pleasant emotion such as joy.
Arousal (or activation) refers to the physiological intensity felt. Sadness, which leads to withdrawal, is associated with low activation. Anger activation is high because it releases an influx of energy that allows the individual to prepare for battle.

These two dimensions are not sufficient to characterize the emotion (e.g. both anger and fear have a negative valence and high arousal), hence the need of a third dimension. However, this one does not create a consensus in the research.

According to Russell et Mehrabian (1977), it could correspond to control. Thus, an angry individual feels in control of the situation (positive control) while a person overwhelmed by sadness seems to see the situation slipping away (negative control). The resulting model is called PAD (Pleasure, Arousal, Dominance) and is sufficient to describe all emotions according to the authors.

The Pleasure-Arousal-Dominance model (Source: Buechel et Hahn, 2016)

The Pleasure-Arousal-Dominance model, french version (based on Karg et al., 2013)

According to Schlosberg (1954), it should be the attention-rejection dimension. Rejection is related to a strong attempt to exclude the external object, while attention is the active opposite of rejection.

In addition to better temporal resolution, the continuous approach allows for a wide range of emotional states to be represented and for variations in these states over time to be handled (Gunes et al., 2011). These advantages have led to a growing interest in affective computing.

Go to top

Databases with Discrete Emotions

🔗 Anchor Links:

Description of the Databases: General information of the databases that have chosen discrete emotions
Databases and Emotional Ambiguity: Position of the databases with respect to the representation of emotional ambiguity

🚩 Databases marked with a * offer discrete and continuous representation of emotions.

Description of the Databases - Discrete

Name	Year	Language	Modalities	Classes	Number sentences	Description
CMU-MOSEAS (paper)	2020	French, Spanish, Portuguese, German	Vocal, Visual, Textual	Happiness, Anger, Sadness, Fear, Disgust, Surprise	10000 per language	Monologue videos from YouTube. High diversity of topics covered by a large number of speakers. Annotations on sentiments, emotions, subjectivity and 12 personality traits. Each emotion annotated on a [0,3] Likert scale for its degree of presence.
CMU-MOSEI (paper)	2018	English	Vocal, Visual, Textual	Happiness, Anger, Sadness, Fear, Disgust, Surprise	23453	Monologue videos from YouTube. High diversity of topics covered by a large number of speakers. Annotations on emotions and sentiments. Each emotion annotated on a [0,3] Likert scale for its degree of presence.
MELD (paper)	2018	English	Vocal, Visual, Textual	Happiness, Anger, Sadness, Fear, Disgust, Surprise, Neutral	13708	Clips from the TV series Friends. One of the largest databases involving more than two people in a conversation.
OMG-Emotions* (paper)	2018	English	Vocal, Visual, Textual	Happiness, Anger, Sadness, Fear, Disgust, Surprise, Neutral	2400	Monologue videos from YouTube. Uses gradual annotations with a focus on contextual emotion expressions.
RAVDESS (paper)	2018	English	Vocal, Visual	For speech: Calm, Happy, Sad, Angry, Fearful, Surprise, Disgust, Neutral ; For song: Calm, Happy, Sad, Angry, Fearful, Neutral	For each modality (vocal only, visual only, audio-visual): 1440 for speech, 1012 for song	Isolated sentences uttered by professional actors in studio. Designed for emotion recognition in speech and songs.
GEMEP (paper full set, paper core set)	2010	French	Vocal, Visual	Admiration, amusement, tenderness, anger, disgust, despair, pride, shame, anxiety, interest, irritation, joy, contempt, fear, pleasure, relief, surprise, sadness	1260 (full set) / 154 (core set) portrayals¹	Professional actors playing out emotional scenarios. Emotions are chosen so that all the values of the valence-arousal pairs are represented (positive/negative valence, high/low arousal).
IEMOCAP* (paper)	2008	English	Vocal, Visual, Textual, Markers on face, head and hand	Happiness, Anger, Sadness, Neutral, Frustration	10039	Emotions are played out by professional actors. Widely used in affective computer research.
eNTERFACE'05 (paper)	2006	English	Vocal, Visual	Happiness, Anger, Sadness, Fear, Disgust, Surprise, Neutral	1166	Isolated sentences uttered by naive subjects from 14 nations. Mood induction by listening to short stories. Black background.

Go to top

Databases and Emotional Ambiguity - Discrete

Name	Year	Final annotation	# annotations per sentence	Aggregation of annotations
CMU-MOSEAS	2020	Many emotions possible per sentence	3	-
CMU-MOSEI	2018	Many emotions possible per sentence	3	-
MELD	2018	One emotion per sentence	3	Majority vote
OMG-Emotions*	2018	One emotion per sentence	5	Majority vote
RAVDESS	2018	One emotion per sentence with 2 intensities (normal, strong)	10	-
GEMEP	2010	One or two emotion(s) per portrayal with 4 intensities (full set) / One emotion per portrayal with 5 intensities (core set)	23 (audio-video), 23 (audio only), 25 (video only)	-
IEMOCAP*	2008	One emotion per sentence	3	Majority vote
eNTERFACE'05	2006	One emotion per sentence	No annotation	-

Go to top

Databases with Continuous Emotions

🔗 Anchor Links:

Description of the Databases: General information of the databases that have chosen continuous emotions
Databases and Emotional Ambiguity: Position of the databases with respect to the representation of emotional ambiguity

🚩 Databases marked with a * offer discrete and continuous representation of emotions.

Description of the Databases - Continuous

Name	Year	Language	Modalities	Dimensions	Number sentences	Description
MuSe-CaR (paper)	2021	English	Vocal, Visual, Textual	Valence, Arousal, Trustworthiness	28295	Car reviews from YouTube. In-the-wild characteristics (e.g. reviewer visibility, ambient noises, domain-specific terms). Designed for sentiment recognition, emotion-target engagement and trustworthiness detection.
SEWA (paper)	2019	English, German, Hungarian, Greek, Serbian, Chinese	Vocal, Visual, Textual	Valence, Arousal, Liking	538²	Ordinary people from the same culture discuss advertisements via video conference. Database created for emotion recognition but also for human behavior analysis, including cultural studies.
OMG-Emotions* (paper)	2018	English	Vocal, Visual, Textual	Valence, Arousal	2400	Monologue videos from YouTube. Uses gradual annotations with a focus on contextual emotion expressions.
RECOLA (paper)	2013	French	Vocal, Visual, ECG³, EDA⁴	Valence, Arousal	3,8 hours¹	Online dyadic interactions where participants need to collaborate to solve a survival task. Mood Induction Procedure to elicit emotion.
SEMAINE (paper)	2012	English	Vocal, Visual, Textual	Valence, Arousal, Expectation, Power, Intensity	190 videos¹	Volunteers interact with an artificial character to which a personality trait has been assigned (angry, happy, gloomy, sensible).
IEMOCAP* (paper)	2008	English	Vocal, Visual, Textual, Markers on face, head and hand	Valence, Arousal, Dominance	10039	Emotions are played out by professional actors. Widely used in affective computer research.
SAL (paper)	2008	English	Vocal, Visual	Valence, Arousal	1692 turns¹	Natural human-SAL conversations involving artificial characters with different emotional personalities (angry, sad, gloomy, sensitive).
VAM (paper)	2008	German	Vocal, Visual	Valence, Arousal, Dominance	1018	Videos from a German TV talk show: spontaneous emotions from unscripted discussion. Audio and face annotated separately.

Go to top

Databases and Emotional Ambiguity - Continuous

Name	Year	Final annotation	# annotations per sentence	Aggregation of annotations
MuSe-CaR	2021	Point in the emotional space	5	Evaluator Weighted Estimator
SEWA	2019	Point in the emotional space	At least 3	Canonical Time Warping
OMG-Emotions*	2018	Point in the emotional space	5	Evaluator Weighted Estimator
RECOLA	2013	Point in the emotional space	6	Mean Filtering
SEMAINE	2012	Point in the emotional space	3 - 8	-
IEMOCAP*	2008	Single element from SAM⁵	2	Z-normalisation
SAL	2008	Point in the emotional space	4	Z-normalisation
VAM	2008	Single element from SAM⁵	6 - 17 (audio) ; 8 - 34 (face)	-

Go to top

References

Buechel, S. et U. Hahn (2016). Emotion Analysis as a Regression Problem — Dimensional Models and Their Implications on Emotion Representation and Metrical Evaluation. In ECAI 2016, pp.1114–1122. IOS Press. https://doi.org/10.3233/978-1-61499-672-9-1114

Ekman, P. (1992). An Argument for Basic Emotions. Cognition & Emotion 6(3-4), 169–200. https://doi.org/10.1080/02699939208411068

Gunes, H., B. Schuller, M. Pantic, et R. Cowie (2011). Emotion representation, analysis and synthesis in continuous space : A survey. In 2011 IEEE International Conference on Automatic Face & Gesture Recognition (FG), pp. 827–834. IEEE. https://doi.org/10.1109/FG.2011.5771357

Karg, M., Samadani, A. A., Gorbet, R., Kühnlenz, K., Hoey, J., & Kulić, D. (2013). Body movements for affective expression: A survey of automatic recognition and generation. IEEE Transactions on Affective Computing, 4(4), 341-359.

Nugier, A. (2009). Histoire et grands courants de recherche sur les émotions. Revue électronique de Psychologie Sociale 4(4), 8–14. https://doi.org/10.1109/FG.2011.5771357

Plutchik, R. (2001). The Nature of Emotions : Human emotions have deep evolutionary roots, a fact that may explain their complexity and provide tools for clinical practice. American Scientist 89(4), 344–350. https://www.jstor.org/stable/27857503

Russell, J. A. et A. Mehrabian (1977). Evidence for a Three-Factor Theory of Emotions. Journal of Research in Personality 11(3), 273–294. https://doi.org/10.1016/0092-6566(77)90037-X

Schlosberg, H. (1954). Three dimensions of emotion. Psychological Review, 61(2), 81–88. https://doi.org/10.1037/h0054570

No information provided on the number of annotated sentences. ↩ ↩² ↩³ ↩⁴
The paper does not specify whether it is all languages combined or a specific language. ↩
For Electrocardiogram ↩
For Electrodermal Activity ↩
SAM for Self-Assessment Manikins ↩ ↩²

helenetran3/MER-Databases-and-Emotion-Ambiguity

SOTA - Multimodal Emotion Recognition Databases and Emotional Ambiguity

Main Contributions

README Structure

Research Article

Interested in my work?

Emotional Models

Discrete Model

Continuous Model

Databases with Discrete Emotions

Description of the Databases - Discrete

Databases and Emotional Ambiguity - Discrete

Databases with Continuous Emotions

Description of the Databases - Continuous

Databases and Emotional Ambiguity - Continuous

References

Footnotes