/EMO-AffectNetModel

Dynamic and static models for real-time facial emotion recognition

Primary LanguageJupyter NotebookMIT LicenseMIT

In Search of the Robust Facial Expressions Recognition Model: The Visual Cross-Corpus Study

PWC App

test_4_AffWild2 test_2_AffWild2 test_3_AffWild2

In this paper we present the largest visual emotion recognition cross-corpus study to date. We suggested a novel and effective end-to-end emotion recognition framework consisted of two key elements, which are employed for differentfunctions:

(1) the backbone emotion recognition model, which is based on the VGGFace2 (Cao et al., 2018) ResNet50 model (He et al., 2016), trained in a balanced way, and is able to predict emotion from the raw image with high performance;

(2) the temporal block stacked on top of the backbone model and trained with dynamic visual emotional datasets (RAVDESS (Livingstone et al., 2018), CREMA-D (Cao et al., 2014), SAVEE (Haq et al., 2008), RAMAS (Perepelkina et al., 2018), IEMOCAP (Busso et al., 2008), Aff-Wild2 (Kollias et al., 2018)) using the cross-corpus protocol in order to show its reliability and effectiveness.

During the research, the backbone model was fine-tuned on the largest facial expression dataset AffectNet (Mollahosseini et al., 2019) contained static images. Our backbone model achieved an accuracy of 66.4 % on the AffectNet validation set. We achieved 66.5% accuracy using label smoothing technique.

In this GitHub repository we propose for common use (for scientific usage only) the backbone emotion recognition model and 6 LSTM models obtained as a result of leave-one-corpus-out cross-validation experiment.

Table. Results (Unweighted average recall, UAR) of leave-one-corpus-out cross-validation

Train datasets Test dataset Name model UAR, %
RAVDESS, CREMA-D, SAVEE, RAMAS, IEMOCAP Aff-Wild2 Aff-Wild2 51,6
Aff-Wild2, CREMA-D, SAVEE, RAMAS, IEMOCAP RAVDESS RAVDESS 65,8
Aff-Wild2, RAVDESS, SAVEE, RAMAS, IEMOCAP CREMA-D CREMA-D 60,6
Aff-Wild2, RAVDESS, CREMA-D, RAMAS, IEMOCAP SAVEE SAVEE 76,1
Aff-Wild2, RAVDESS, CREMA-D, SAVEE, IEMOCAP RAMAS RAMAS 44,3
Aff-Wild2, RAVDESS, CREMA-D, SAVEE, RAMAS IEMOCAP IEMOCAP 25,1

We provide two static (backbone) models trained using the tensorflow framefork. Both tensorflow models have been converted to TorchScript models. To check four models on the AffectNet validation set, you should run check_tf_torch_models_on_Affectnet.ipynb.

To check static (backbone) models by webcam, you should run check_backbone_models_by_webcam. Webcam result:

result

We provide six temporal (LSTM) models trained using the tensorflow framefork. All tensorflow models have been converted to TorchScript models. To check backbone and temporal models (CNN+LSTM) by webcam, you should run check_temporal_models_by_webcam. Webcam result:

result

To predict emotions for all videos in your folder, you should run the command python run.py --path_video video/ --path_save report/.

To demonstrate the functioning of our pipeline, we have run it on several videos from the RAVDESS corpus. The output is:

results_emo_pred_videos

To get new video file with visualization of emotion prediction for each frame, you should run the command python visualization.py. Below are examples of test videos:

01-01-03-02-02-01-01 01-01-05-02-01-02-14 01-01-07-02-02-02-06

Citation

If you are using EMO-AffectNetModel in your research, please consider to cite research paper. Here is an example of BibTeX entry:

@article{RYUMINA2022,
  title        = {In Search of a Robust Facial Expressions Recognition Model: A Large-Scale Visual Cross-Corpus Study},
  author       = {Elena Ryumina and Denis Dresvyanskiy and Alexey Karpov},
  journal      = {Neurocomputing},
  year         = {2022},
  doi          = {10.1016/j.neucom.2022.10.013},
  url          = {https://www.sciencedirect.com/science/article/pii/S0925231222012656},
}

Links to papers