/NLP-Project

Recognition of Emotion from Speech directly

Primary LanguageJupyter NotebookMIT LicenseMIT

NLP-Project

Task:

Recognition of Emotion from Speech directly.

Why the need from Speech to Emotion directly?

  1. The main problem with traditional systems is that the errors occurred while transcribing audio is propagated and that affects the emotion recognition task.
  2. We lose important acoustic features like pitch, loudness etc.
  3. Confusing emotional contexts in some cases. For example, the phrase, ‘What a man!’ can indicate surprise or even disgust.

Datasets used:

  1. The Ryerson Audio-Visual Database of Emotional Speech and Song - RAVDESS

Only the audio tracks have been used.

Dataset description:

○ Gender balanced consisting of 24 professional actors.
○ 7 emotions in total: calm, happy, sad, angry,fearful, surprise, and disgust expressions.
○ Each expression is produced at two levels of emotional intensity, with an additional neutral expression.
1250 samples in total.

Test accuracy obtained: 73.5 %



  1. Toronto Emotional Speech Set - TESS

Dataset description:

○ Voices of two actresses, aged 26 and 64.
○ 7 emotions in total: anger, disgust, fear, happiness, pleasant surprise, sadness, and neutral.
○ Audiometric testing indicated that both actresses have thresholds within the normal range.
1370 samples in total.

Test accuracy obtained: 98.6 %