/Neural-Network-for-Emotion-Recognition-from-Speech

Convolutional neural network for language-independend emotion classification from recorded speech

Primary LanguagePython

Neural Network for Emotion Recognition from Speech

Mono- and cross-lingual emotion classification in recorded speech through a convolutional neural network.

Read the paper here

Data

This model was trained and tested on a collective dataset, consisting of the english IEMOCAP and the french RECOLA datasets. These datasets need to be downloaded manually and processed via OpenSMILE using the configuration file in the input folder! Afterwards transform them to binaries using the write binaries.py

Hyperparameters

Parameter Value
Activation Functions Relu
Loss Function Softmax Cross Entropy
Optimizer ADAM
Init. Learning Rate 0.001
Mini-batch size 50
Stride 3
Dropout 0.5
Epoches 50

achieved accuracy

Englisch testset

Class Mono-lingual Multi-lingual Cross-language
Sadness 0.000 0.015 0.015
Anger 0.019 0.014 0.014
Pleasure 0.043 0.010 0.120
Joy 0.942 0.985 0.864
MICRO 0.421 0.432 0.405

French testset

Class Mono-lingual Multi-lingual Cross-language
Sadness 0.070 0.000 0.230
Anger 0.200 0.200 0.200
Pleasure 0.350 0.035 0.357
Joy 0.754 0.912 0.403
MICRO 0.533 0.524 0.359

Getting Started

You can view the notebook here on github.

Run the notebook

Prerequisites

  • Python 3
  • Tensorflow
  • Jupyter

starting the notebook

Simply open a new terminal in the directory and type:

> jupyter notebook

setup model

make sure you run all codeblocks from top to bottom to setup the network

Running the tests

To test the model, you need only to run the last codeblock. This will evaluate the model and print the accuracy for each testset.

Built With

Contributors