This repo contains the code for detecting emotion from the conversational dataset IEMOCAP for the implementation of the paper "Multimodal Transformer With Learnable Frontend and Self Attention for Emotion Recognition" submitted to ICASSP 2022. This repository contains the code when Session 5 is conisdered as test and Session 1 as validation.
- The implementation has three stages, namely, training the unimodal audio and text models, training the Bi-GRU with self-attention and the multimodal transformer
- With the wav files for the audio and the csv files for text, the first step would be to run audio_model.py and the notebook sentiment_text.ipynb for audio and text respectively
- The representations from the trained models in the step above are used to create pickle files for the entire dataset
- With these representations, two Bi-GRU models with self-attention (refer to bigru_audio/text.ipynb) is trained. The best models for both audio and text are already provided in the unimodal_models folder.
- A multimodal transformer is trained on both the modalities of the dataset for the final accuracy results
- Please note that usage of IEMOCAP requires permission. Once this is done, we can share the dataset files. For permission please visit IEMOCAP release
- Clone the repository
https://github.com/iiscleap/multimodal_emotion_recognition.git
- For the LEAF-CNN framework for audio sentiment classification, we use this Pytorch implementation for LEAF.
- Run
python3 -m venv .leaf_venv
- Run
source .leaf_venv/bin/activate
- Run
pip install -r requirements_leaf.txt
- Clone the repository
https://github.com/denfed/leaf-audio-pytorch.git
- The files should be arranged as follows:
Sess5 └───leaf_wavs_train └───Ses01F_impro01_F000.wav └───Ses01F_impro01_F005.wav └───... └───leaf_wavs_test └───Ses05F_impro01_F000.wav └───Ses05F_impro01_F008.wav └───... label_dict.json leaf-audio-pytorch-main │ └───__init__.py └───setup.py └───... └───audio_model.py └───leaf_audio_pytorch └───...
- Run
- Running sentiment_text.ipynb provides the text unimodal model
- For running the two Bi-GRU models with self-attention, run bigru_audio.ipynb to get
best_model_aud0.tar
and bigru_text.ipynb to getbest_model_text0.tar
. These are to be placed in the folder unimodal_models. - For running the multimodal transformer we create another environment
- Run
python3 -m venv .trans_venv
- Run
source .trans_venv/bin/activate
- Run
pip install -r requirements.txt
- With the config file provided, run
python3 main.py
- The files at this stage should be arranged as follows:
main.py config.py features └───<PICKLE_FILE> src └───model_lstm_tranformers.py └───read_data.py └───test_lstm_transformers.py └───... unimodal_models └───best_model_aud0.tar └───best_model_text0.tar
- Run