
Modality-Transferable-MER, multimodal emotion recognition model with zero-shot and few-shot abilities.

Primary LanguagePython

Modality-Transferable Emotion Embeddings for Low-Resource Multimodal Emotion Recognition

CC BY 4.0

Paper accepted at the AACL-IJCNLP 2020:

Modality-Transferable Emotion Embeddings for Low-Resource Multimodal Emotion Recognition, by Wenliang Dai, Zihan Liu, Tiezheng Yu, Pascale Fung.

[ACL Anthology][ArXiv][Semantic Scholar]

We use the pre-processed features from the CMU-Multimodal SDK.

Or you can directly download the data from here.

Preparation for running

  1. Create a new folder named data at the root of this project

  2. Download Emotion Embeddings from here, and then put it in the $data$ folder.

  3. Download data

    • For a quick run
      • Just download our saved torch.utils.data.dataset.Dataset datasets from here, unzip it at the root of this project.
    • For a normal run
      • Download the data from here
      • Check the data_folder_structure.txt file, which shows the structure about how to organize data files
      • Put data files correspondingly
  4. Good to go!

Command line arguments and examples

usage: main.py [-h] -bs BATCH_SIZE -lr LEARNING_RATE [-wd WEIGHT_DECAY] -ep
               EPOCHS [-es EARLY_STOP] [-cu CUDA] [-mo MODEL] [-fu FUSION]
               [-cl CLIP] [-sc] [-se SEED] [-pa PATIENCE] [-ez] [--loss LOSS]
               [--optim OPTIM] [--threshold THRESHOLD] [--verbose]
               [-mod MODALITIES] [--valid] [--test] [--dataset DATASET]
               [--aligned] [--data-seq-len DATA_SEQ_LEN]
               [--data-folder DATA_FOLDER] [--glove-emo-path GLOVE_EMO_PATH]
               [--cap] [--iemocap4] [--iemocap9] [--zsl ZSL]
               [--zsl-test ZSL_TEST] [--fsl FSL] [--ckpt CKPT] [-dr DROPOUT]
               [-nl NUM_LAYERS] [-hs HIDDEN_SIZE]
               [-hss HIDDEN_SIZES [HIDDEN_SIZES ...]] [-bi] [--gru]
               [--hidden-dim HIDDEN_DIM]

Multimodal Emotion Recognition

optional arguments:
  -h, --help            show this help message and exit
  -bs BATCH_SIZE, --batch-size BATCH_SIZE
                        Batch size
  -lr LEARNING_RATE, --learning-rate LEARNING_RATE
                        Learning rate
  -wd WEIGHT_DECAY, --weight-decay WEIGHT_DECAY
                        Weight decay
  -ep EPOCHS, --epochs EPOCHS
                        Number of epochs
  -es EARLY_STOP, --early-stop EARLY_STOP
                        Early stop
  -cu CUDA, --cuda CUDA
                        Cude device number
  -mo MODEL, --model MODEL
                        Model type: mult/rnn/transformer/eea
  -fu FUSION, --fusion FUSION
                        Modality fusion type: ef/lf
  -cl CLIP, --clip CLIP
                        Use clip to gradients
  -sc, --scheduler      Use scheduler to optimizer
  -se SEED, --seed SEED
                        Random seed
  -pa PATIENCE, --patience PATIENCE
                        Patience of the scheduler
  -ez, --exclude-zero   Exclude zero in evaluation
  --loss LOSS           loss function: l1/mse/ce/bce
  --optim OPTIM         optimizer function: adam/sgd
  --threshold THRESHOLD
                        Threshold of for multi-label emotion recognition
  --verbose             Verbose mode to print more logs
  -mod MODALITIES, --modalities MODALITIES
                        What modalities to use
  --valid               Valid mode
  --test                Test mode
  --dataset DATASET     Dataset to use
  --aligned             Aligned experiment or not
  --data-seq-len DATA_SEQ_LEN
                        Data sequence length
  --data-folder DATA_FOLDER
                        path for storing the dataset
  --glove-emo-path GLOVE_EMO_PATH
  --cap                 Capitalize the first letter of emotion words
  --iemocap4            Only use 4 emtions in IEMOCAP
  --iemocap9            Only use 9 emtions in IEMOCAP
  --zsl ZSL             Do zero shot learning on which emotion (index)
  --zsl-test ZSL_TEST   Notify which emotion was zsl before
  --fsl FSL             Do few shot learning on which emotion (index)
  --ckpt CKPT
  -dr DROPOUT, --dropout DROPOUT
  -nl NUM_LAYERS, --num-layers NUM_LAYERS
                        num of layers of LSTM
  -hs HIDDEN_SIZE, --hidden-size HIDDEN_SIZE
                        hidden vector size of LSTM
  -bi, --bidirectional  Use Bi-LSTM
  --gru                 Use GRU rather than LSTM
  --hidden-dim HIDDEN_DIM
                        Transformers hidden unit size

Run the code

main.py is the entry file of the whole project, use corresponding CLIs for different purposes.


Training the model on the CMU-MOSEI dataset

python main.py --cuda=0 -bs=64 -lr=1e-3 -ep=100 --model=eea -bi --hidden-sizes 300 200 100 --num-layers=2 --dropout=0.15 --data-folder=./data/cmu-mosei/ --data-seq-len=20 --dataset=mosei_emo --aligned --loss=bce --clip=1.0 --early-stop=8 -mod=tav --patience=5   

Training the model on the IEMOCAP dataset

python main.py --cuda=0 -bs=64 -lr=1e-3 -ep=100 --model=eea --data-folder=./data/iemocap/ --data-seq-len=50 --dataset=iemocap --loss=bce --clip=1.0 --early-stop=8 --hidden-sizes 300 200 100 -mod=tav --patience=5 --aligned -bi --num-layers=2 --dropout=0.15

Training a early fusion lstm baseline

python main.py --cuda=0 -bs=64 -lr=1e-3 -ep=100 --model=rnn --fusion=ef --data-folder=./data/iemocap/ --data-seq-len=50 --dataset=iemocap --loss=bce --clip=1.0 --early-stop=8 --hidden-sizes 300 200 100 -mod=tav --patience=5 --aligned -bi --num-layers=2 --dropout=0.15

Validating and testing

If you only want to do a validation or testing on a trained model, you can add a --valid or --test flag to the original command, and also include --ckpt=[PathToSavedCheckpoint] to indicate the path of the trained model.

Zero-shot learning (ZSL)

Add a --zsl=[EmotionIndex] cli to the original training command, in which the EmotionIndex is the index of the emotion category that you want to do zero-shot on. As mentioned in the paper, due to different strategies for CMU-MOSEI and IEMOCAP datasets, --zsl=[EmotionIndex] has slightly different meaning for them, we list the correct cli here:

For CMU-MOSEI (ZSL emotion data will be removed from the training data),

  • --zsl=0, do ZSL on anger
  • --zsl=1, do ZSL on disgust
  • --zsl=2, do ZSL on fear
  • --zsl=3, do ZSL on happy
  • --zsl=4, do ZSL on sad
  • --zsl=5, do ZSL on surprise

For IEMOCAP (the training data remains unchanged, as ZSL emotion is from extra low-resource data),

  • --zsl=1, do ZSL on excited
  • --zsl=4, do ZSL on surprised
  • --zsl=5, do ZSL on frustrated

Few-shot learning (FSL)

For few-shot learning, the logic is similar to ZSL, just use --fsl=[EmotionIndex]


  1. Python 3.6 +
  2. PyTorch 1.4 +
  3. Nvidia GTX 1080Ti GPU (or more advanced)