A Self-Supervised Learning (SSL) Method for Wearable Emotion Recognition (WER)
This repository contains official implementation of the paper: Transformer-Based Self-Supervised Multimodal Representation Learning for Wearable Emotion Recognition
Model Architecture
Overview of our self-supervised multimodal representation learning framework. The proposed self-supervised learning (SSL) model is first pre-trained with signal transform recognition as the pretext task to learn generalized multimodal representation. The encoder part of the resulting pre-trained model is then served as a feature extractor for downstream tasks which is frozen or fine-tuned on the labeled samples to predict emotion classes.
Usage
1. Set up conda environment
conda env create -f environment.yml
conda activate SSL
2. Datasets
The pre-trained SSL model was evaluated on three multimodal datasets: WESAD, CASE and K-EmoCon. Please cite the creators.
3. Train the SSL model
python SSL.py --path=<path to the downloaded codes> --data_pat=<path to the unlabeled data>
In the paper, we use the PRESAGE dataset that we collected at the Presage Training Center in Lille, France, for self-supervised learning. Discussions with the funders and the University of Lille are underway to make this dataset publicly accessible. In this case, the pre-trained models are shared in the folder pretrained_models
. You can also use your own data at hand for pre-training.
4. Evaluate the SSL model on supervised emotion datasets
For WESAD:
python SL.py --path=<path to the downloaded codes> --dataset_opt='WESAD' --data_path=<path to data> --best_model_dir=<path to the pretrained model> --sl_num_classes=<number of emotion categories: 2 or 3> --mode=<training mode: 'freeze' or 'fine_tune'>
For CASE/K-EmoCon, you need to specify the emotional dimension, i.e., valence or arousal:
python SL.py --path=<path to the downloaded codes> --dataset_opt='CASE'/'KemoCon' --data_path=<path to data> --best_model_dir=<path to the pretrained model> --sl_num_classes=<number of emotion categories: 2 or 3> --mode=<training mode: 'freeze' or 'fine_tune'> --av_opt=<emotional dimension: 'valence' or 'arousal'>
Acknowledgements
The proposed work was supported by the French State, managed by the National Agency for Research (ANR) under the Investments for the future program with reference ANR16-IDEX-0004 ULNE.
Citation
If this paper is useful for your research, please cite us at:
@ARTICLE{10091193,
author={Wu, Yujin and Daoudi, Mohamed and Amad, Ali},
journal={IEEE Transactions on Affective Computing},
title={Transformer-Based Self-Supervised Multimodal Representation Learning for Wearable Emotion Recognition},
year={2023},
volume={},
number={},
pages={1-16},
doi={10.1109/TAFFC.2023.3263907}}