MultiMAE-DER: Multimodal Masked Autoencoder for Dynamic Emotion Recognition (IEEE ICPRS 2024)

Website | arXiv | BibTeX
Peihao Xiang, Chaohao Lin, Kaida Wu, and Ou Bai
HCPS Laboratory, Department of Electrical and Computer Engineering, Florida International University

Official TensorFlow implementation and pre-trained VideoMAE models for MultiMAE-DER: Multimodal Masked Autoencoder for Dynamic Emotion Recognition.

Overview

Illustration of our MultiMAE-DER.

General Multimodal Model vs. MultiMAE-DER. The uniqueness of our approach lies in the capability to extract features from cross-domain data using only a single encoder, eliminating the need for targeted feature extraction for different modalities.

Multimodal Sequence Fusion Strategies.

Implementation details

The architecture of MultiMAE-DER.

Main Results

RAVDESS

CREMA-D

IEMOCAP

Contact

If you have any questions, please feel free to reach me out at pxian001@fiu.edu.

Acknowledgments

This project is built upon VideoMAE and MAE-DFER. Thanks for their great codebase.

License

This project is under the Apache License 2.0. See LICENSE for details.

Citation

If you find this repository helpful, please consider citing our work:

@misc{xiang2024multimaeder,
      title={MultiMAE-DER: Multimodal Masked Autoencoder for Dynamic Emotion Recognition}, 
      author={Peihao Xiang and Chaohao Lin and Kaida Wu and Ou Bai},
      year={2024},
      eprint={2404.18327},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Peihao-Xiang/MultiMAE-DER