Speech emotion recognition with 2D CNN LSTM network in PyTorch

Introduction

The network is similar to what Zhao et al. proposed in the paper Speech emotion recognition using deep 1D & 2D CNN LSTM networks.

Datasets
- EMO-DB
- EMOVO
Usage
- Training
- Testing
Acknowledgements
References

Datasets

The configuration files are, by default, set to have the datasets in the data folder.

EMO-DB

EMOVO

Usage

The project structure is a rework of victoresque's PyTorch project template, so for more information check out his repository.

Training

A model can be trained by running train.py and passing the wanted configuration via the --config argument. E.g.:

python train.py --config <config file>.json

Testing

A trained model can be tested by running test.py and passing the path to the saved checkpoint with the --resume argument. For example, with the default configuration, it would be:

python test.py --resume saved/models/<model name>/<timestamp>/<checkpoint>.pth

Acknowledgements

Thanks to victoresque for the project template.

References

Jianfeng Zhao, Xia Mao and Lijiang Chen. "Speech emotion recognition using deep 1D & 2D CNN LSTM networks". in: Elsevier Biomedical Signal Processing and Control (2019), pp. 312–323.

aguarnore/pytorch-cnn-lstm-speech-emotion-recognition