Pytorch implementation of the paper "A Transformer-based joint-encoding for Emotion Recognition and Sentiment Analysis"
Challenge-HML Best Paper Award
@inproceedings{delbrouck-etal-2020-transformer,
title = "A Transformer-based joint-encoding for Emotion Recognition and Sentiment Analysis",
author = "Delbrouck, Jean-Benoit and
Tits, No{\'e} and
Brousmiche, Mathilde and
Dupont, St{\'e}phane",
booktitle = "Second Grand-Challenge and Workshop on Multimodal Language (Challenge-HML)",
month = jul,
year = "2020",
address = "Seattle, USA",
publisher = "Association for Computational Linguistics",
url = "https://www.aclweb.org/anthology/2020.challengehml-1.1",
doi = "10.18653/v1/2020.challengehml-1.1",
pages = "1--7"
}
The model Model_AV is the module used for the UMONS solution to the MOSEI dataset using only linguistic and acoustic inputs.
Results can be replicated at the following Google Colab sheet:
Create a 3.6 python environement with:
torch 1.2.0
torchvision 0.4.0
numpy 1.18.1
We use GloVe vectors from space. This can be installed to your environement using the following commands :
wget https://github.com/explosion/spacy-models/releases/download/en_vectors_web_lg-2.1.0/en_vectors_web_lg-2.1.0.tar.gz -O en_vectors_web_lg-2.1.0.tar.gz
pip install en_vectors_web_lg-2.1.0.tar.gz
Download data from here.
Unzip the files into the 'data' folder
More informations about the data can be found in the 'data' folder
To train a Model_AV model on the emotion labels, use the following command :
python main.py --model Model_LA --name mymodel --task emotion --multi_head 4 --ff_size 1024 --hidden_size 512 --layer 4 --batch_size 32 --lr_base 0.0001 --dropout_r 0.1
Checkpoints are created in folder ckpt/mymodel
Argument task
can be set to emotion
or sentiment
. To make a binarized sentiment training (positive or negative), use --task_binary True
You can evaluate a model by typing :
python ensembling.py --name mymodel
The task settings are defined in the checkpoint state dict, so the evaluation will be carried on the dataset you trained your model on.
By default, the script globs all the training checkpoints inside the folder and ensembling will be performed.
Results are run on a single GeForce GTX 1080 Ti.
Training performances:
Modality | Memory Usage | GPU Usage | sec / epoch | Parameters | Checkpoint size |
---|---|---|---|---|---|
Linguistic + acoustic | 320 Mb | 2400 MiB | 103 | ~ 33 M | 397 Mb |
Linguistic + acoustic + vision |
You should approximate the following results :
Task Accuracy | val | test | test ensemble | epochs |
---|---|---|---|---|
Sentiment-7 | 43.61 | 43.90 | 45.36 | 6 |
Sentiment-2 | 82.30 | 81.53 | 82.26 | 8 |
Emotion-6 | 81.21 | 81.29 | 81.48 | 3 |
Ensemble results are of max 5 single models
7-class and 2-class sentiment and emotion models have been train according to the instructions here.
Result Sentiment-7 ensemble
is obtained from these checkpoints : Download Link
Result Sentiment-2 ensemble
is obtained from these checkpoints : Download Link
Result Emotion ensemble
is obtained from these checkpoints : Download Link