/MultiMAE-DER

TensorFlow code implementation of "MultiMAE-DER: Multimodal Masked Autoencoder for Dynamic Emotion Recognition"

Primary LanguageJupyter NotebookApache License 2.0Apache-2.0

MultiMAE-DER: Multimodal Masked Autoencoder for Dynamic Emotion Recognition (IEEE ICPRS 2024)

PWC
PWC
PWC

Website | arXiv | BibTeX
Peihao Xiang, Chaohao Lin, Kaida Wu, and Ou Bai
HCPS Laboratory, Department of Electrical and Computer Engineering, Florida International University

Open in Colab Hugging Face Datasets

Official TensorFlow implementation and pre-trained VideoMAE models for MultiMAE-DER: Multimodal Masked Autoencoder for Dynamic Emotion Recognition.

Overview


Illustration of our MultiMAE-DER.

General Multimodal Model vs. MultiMAE-DER. The uniqueness of our approach lies in the capability to extract features from cross-domain data using only a single encoder, eliminating the need for targeted feature extraction for different modalities.


Multimodal Sequence Fusion Strategies.

Implementation details


The architecture of MultiMAE-DER.

Main Results

RAVDESS

Result_on_RAVDESS

CREMA-D

Result_on_CREMA-D

IEMOCAP

Result_on_IEMOCAP

Contact

If you have any questions, please feel free to reach me out at pxian001@fiu.edu.

Acknowledgments

This project is built upon VideoMAE and MAE-DFER. Thanks for their great codebase.

License

This project is under the Apache License 2.0. See LICENSE for details.

Citation

If you find this repository helpful, please consider citing our work:

@misc{xiang2024multimaeder,
      title={MultiMAE-DER: Multimodal Masked Autoencoder for Dynamic Emotion Recognition}, 
      author={Peihao Xiang and Chaohao Lin and Kaida Wu and Ou Bai},
      year={2024},
      eprint={2404.18327},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}