/WER-SSL

Primary LanguagePython

A Self-Supervised Learning (SSL) Method for Wearable Emotion Recognition (WER)

This repository contains official implementation of the paper: Transformer-Based Self-Supervised Multimodal Representation Learning for Wearable Emotion Recognition

Model Architecture

Overview of our self-supervised multimodal representation learning framework. The proposed self-supervised learning (SSL) model is first pre-trained with signal transform recognition as the pretext task to learn generalized multimodal representation. The encoder part of the resulting pre-trained model is then served as a feature extractor for downstream tasks which is frozen or fine-tuned on the labeled samples to predict emotion classes.

Usage

1. Set up conda environment

conda env create -f environment.yml
conda activate SSL

2. Datasets

The pre-trained SSL model was evaluated on three multimodal datasets: WESAD, CASE and K-EmoCon. Please cite the creators.

3. Train the SSL model

python SSL.py --path=<path to the downloaded codes> --data_pat=<path to the unlabeled data>

In the paper, we use the PRESAGE dataset that we collected at the Presage Training Center in Lille, France, for self-supervised learning. Discussions with the funders and the University of Lille are underway to make this dataset publicly accessible. In this case, the pre-trained models are shared in the folder pretrained_models. You can also use your own data at hand for pre-training.

4. Evaluate the SSL model on supervised emotion datasets

For WESAD:

python SL.py --path=<path to the downloaded codes> --dataset_opt='WESAD' --data_path=<path to data> --best_model_dir=<path to the pretrained model> --sl_num_classes=<number of emotion categories: 2 or 3> --mode=<training mode: 'freeze' or 'fine_tune'>

For CASE/K-EmoCon, you need to specify the emotional dimension, i.e., valence or arousal:

python SL.py --path=<path to the downloaded codes> --dataset_opt='CASE'/'KemoCon' --data_path=<path to data> --best_model_dir=<path to the pretrained model> --sl_num_classes=<number of emotion categories: 2 or 3> --mode=<training mode: 'freeze' or 'fine_tune'> --av_opt=<emotional dimension: 'valence' or 'arousal'>

Acknowledgements

The proposed work was supported by the French State, managed by the National Agency for Research (ANR) under the Investments for the future program with reference ANR16-IDEX-0004 ULNE.

Citation

If this paper is useful for your research, please cite us at:

@ARTICLE{10091193,
  author={Wu, Yujin and Daoudi, Mohamed and Amad, Ali},
  journal={IEEE Transactions on Affective Computing}, 
  title={Transformer-Based Self-Supervised Multimodal Representation Learning for Wearable Emotion Recognition}, 
  year={2023},
  volume={},
  number={},
  pages={1-16},
  doi={10.1109/TAFFC.2023.3263907}}