The network is similar to what Zhao et al. proposed in the paper Speech emotion recognition using deep 1D & 2D CNN LSTM networks.
The configuration files are, by default, set to have the datasets in the data
folder.
The project structure is a rework of victoresque's PyTorch project template, so for more information check out his repository.
A model can be trained by running train.py
and passing the wanted configuration via the --config
argument. E.g.:
python train.py --config <config file>.json
A trained model can be tested by running test.py
and passing the path to the saved checkpoint with the --resume
argument.
For example, with the default configuration, it would be:
python test.py --resume saved/models/<model name>/<timestamp>/<checkpoint>.pth
Thanks to victoresque for the project template.
Jianfeng Zhao, Xia Mao and Lijiang Chen. "Speech emotion recognition using deep 1D & 2D CNN LSTM networks". in: Elsevier Biomedical Signal Processing and Control (2019), pp. 312–323.