This project entailed research, development, implementation, and evaluation of a 5-layer CNN deep learning speech emotion recognition (SER) model. This work was conducted to satisfy final project requirements and explore team interests in speech emotion recognition.
- PyTorch
- Librosa
- Scikit-learn
Main scope areas of the project include:
- preliminary literature search
- data collection
- exploration and assessment
- pre-processing
- model development and training
- model investigations
- discussion of results.
In this study we use 4 popular datasets (Crema, Tess, Ravdess, Savee) and several data augmentations to balance and expand the data.
Training, validation, and testing were carried out on the combined and individual datasets and results were evaluated and discussed. The final model had an accuracy of 48% on the test data. Conclusions are drawn on the effectiveness of different augmentations and data sets, and how they could be more effectively utilized in future models.