/MOSEI_UMONS

A Transformer-based joint-encoding for Emotion Recognition and Sentiment Analysis

Primary LanguagePythonMIT LicenseMIT

  
    ACL2020

Pytorch implementation of the paper "A Transformer-based joint-encoding for Emotion Recognition and Sentiment Analysis"
Challenge-HML Best Paper Award

@inproceedings{delbrouck-etal-2020-transformer,
    title = "A Transformer-based joint-encoding for Emotion Recognition and Sentiment Analysis",
    author = "Delbrouck, Jean-Benoit  and
      Tits, No{\'e}  and
      Brousmiche, Mathilde  and
      Dupont, St{\'e}phane",
    booktitle = "Second Grand-Challenge and Workshop on Multimodal Language (Challenge-HML)",
    month = jul,
    year = "2020",
    address = "Seattle, USA",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/2020.challengehml-1.1",
    doi = "10.18653/v1/2020.challengehml-1.1",
    pages = "1--7"
}

Model

The model Model_LA is the module used for the UMONS solution to the MOSEI dataset using only linguistic and acoustic inputs.
Results can be replicated at the following Google Colab sheet: Open In Colab

Environement

Create a 3.6 python environement with:

torch              1.2.0    
torchvision        0.4.0   
numpy              1.18.1    

We use GloVe vectors from space. This can be installed to your environement using the following commands :

wget https://github.com/explosion/spacy-models/releases/download/en_vectors_web_lg-2.1.0/en_vectors_web_lg-2.1.0.tar.gz -O en_vectors_web_lg-2.1.0.tar.gz
pip install en_vectors_web_lg-2.1.0.tar.gz

Data

Download data from here.
Unzip the files into the 'data' folder
More informations about the data can be found in the 'data' folder

Training

To train a Model_LA model on the emotion labels, use the following command :

CUDA_VISIBLE_DEVICES=1 python main.py --model Model_LA --name mymodel --task sentiment --multi_head 4 --ff_size 1024 --hidden_size  512 --layer 4 --batch_size 32 --lr_base 0.0001 --dropout_r 0.1

for i in {1..4} do CUDA_VISIBLE_DEVICES=1 python main.py --model Model_LA --name mymodel --task sentiment --multi_head 4 --ff_size 1024 --hidden_size 512 --layer 4 --batch_size 32 --lr_base 0.0001 --dropout_r 0.1 done

Checkpoints are created in folder ckpt/mymodel

Argument task can be set to emotion or sentiment. To make a binarized sentiment training (positive or negative), use --task_binary True

Evaluation

You can evaluate a model by typing :

python ensembling.py --name mymodel

The task settings are defined in the checkpoint state dict, so the evaluation will be carried on the dataset you trained your model on.

By default, the script globs all the training checkpoints inside the folder and ensembling will be performed.

Results:

Results are run on a single GeForce GTX 1080 Ti.
Training performances:

Modality Memory Usage GPU Usage sec / epoch Parameters Checkpoint size
Linguistic + acoustic 320 Mb 2400 MiB 103 ~ 33 M 397 Mb
Linguistic + acoustic + vision

You should approximate the following results :

Task Accuracy val test test ensemble epochs
Sentiment-7 43.61 43.90 45.36 6
Sentiment-2 82.30 81.53 82.26 8
Emotion-6 81.21 81.29 81.48 3

Ensemble results are of max 5 single models
7-class and 2-class sentiment and emotion models have been train according to the instructions here.

Pre-trained checkpoints:

Result Sentiment-7 ensemble is obtained from these checkpoints : Download Link
Result Sentiment-2 ensemble is obtained from these checkpoints : Download Link
Result Emotion ensemble is obtained from these checkpoints : Download Link

License

The source code for the site is licensed under the MIT license, which you can find in the LICENSE file.