/pytorch-cnn-lstm-speech-emotion-recognition

Speech emotion recognition using CNN LSTM networks

Primary LanguagePythonMIT LicenseMIT

Speech emotion recognition with 2D CNN LSTM network in PyTorch

Introduction

The network is similar to what Zhao et al. proposed in the paper Speech emotion recognition using deep 1D & 2D CNN LSTM networks.

Table of contents

Datasets

The configuration files are, by default, set to have the datasets in the data folder.

EMO-DB

EMOVO

Usage

The project structure is a rework of victoresque's PyTorch project template, so for more information check out his repository.

Training

A model can be trained by running train.py and passing the wanted configuration via the --config argument. E.g.:

python train.py --config <config file>.json

Testing

A trained model can be tested by running test.py and passing the path to the saved checkpoint with the --resume argument. For example, with the default configuration, it would be:

python test.py --resume saved/models/<model name>/<timestamp>/<checkpoint>.pth

Acknowledgements

Thanks to victoresque for the project template.

References

Jianfeng Zhao, Xia Mao and Lijiang Chen. "Speech emotion recognition using deep 1D & 2D CNN LSTM networks". in: Elsevier Biomedical Signal Processing and Control (2019), pp. 312–323.