speech_emotion_recognition

Kết quả thực nghiệm trên 2 dạng đặc trưng 1d (Light-GBM, MLP) và 2d (Deep learning).

Cây thư mục: SourceCode

Feature_1d

  -- cremad
            --- cremad_mfcc.ipynb
            --- creamad_melspectrogram.ipynb
            --- cremad_tempogram.ipynb
            --- cremad_features_combination.ipynb
            --- emotion_path.csv
  -- ravdess
            --- ravdess_mfcc.ipynb
            --- ravdess_melspectrogram.ipynb
            --- ravdess_tempogram.ipynb
            --- ravdess_features_combination.ipynb
            --- ravdess_path.csv
  -- ravdess_cremad_features_combination.ipynb

Feature_2d

  -- crema_d_4emo.ipynb
  -- ravdess_4emo.ipynb
  -- done_ravedess_and_crema_d_4emo.ipynb

Trong thư mục Feature_1d các file *.ipynb chỉ cần lưu ý đường dẫn tới vị trí các audio (cần download các dataset trước):
- cremad: .../cremad/AudioWAV/. Ví dụ về các path được tạo đến vị trí từng audio trong file emotion_path.csv
- revdess: .../ravdess/audio_speech_actors_01-24/. Ví dụ về các path được tạo đến vị trí từng audio trong file ravdess_path.csv
- chỉnh sửa đường dẫn nơi lưu model phù hợp.
Việc tạo file path chỉ cần làm một lần và có thể dùng lại (nếu vị trí audio không đổi) và nên lưu lại các file này tiết kiệm thời gian.
Chạy các cell trong *.ipynb để trích xuất đặc trưng và huấn luyện mô hình.
Cite us

@inproceedings{duong2022empirical,
  title={An Empirical Experiment on Feature Extractions Based for Speech Emotion Recognition},
  author={Duong, Binh Van and Ha, Chien Nhu and Nguyen, Trung T and Nguyen, Phuc and Do, Trong-Hop},
  booktitle={Asian Conference on Intelligent Information and Database Systems},
  pages={180--191},
  year={2022},
  organization={Springer}
}

binhfdv/speech_emotion_recognition

speech_emotion_recognition