In Search of the Robust Facial Expressions Recognition Model: The Visual Cross-Corpus Study

In this paper we present the largest visual emotion recognition cross-corpus study to date. We suggested a novel and effective end-to-end emotion recognition framework consisted of two key elements, which are employed for differentfunctions:

(1) the backbone emotion recognition model, which is based on the VGGFace2 (Cao et al., 2018) ResNet50 model (He et al., 2016), trained in a balanced way, and is able to predict emotion from the raw image with high performance;

(2) the temporal block stacked on top of the backbone model and trained with dynamic visual emotional datasets (RAVDESS (Livingstone et al., 2018), CREMA-D (Cao et al., 2014), SAVEE (Haq et al., 2008), RAMAS (Perepelkina et al., 2018), IEMOCAP (Busso et al., 2008), Aff-Wild2 (Kollias et al., 2018)) using the cross-corpus protocol in order to show its reliability and effectiveness.

During the research, the backbone model was fine-tuned on the largest facial expression dataset AffectNet (Mollahosseini et al., 2019) contained static images. Our backbone model achieved an accuracy of 66.4 % on the AffectNet validation set.

In this GitHub repository we propose for common use (for scientific usage only) the backbone emotion recognition model and 6 LSTM models obtained as a result of leave-one-corpus-out cross-validation experiment.

Table. Results (Unweighted average recall, UAR) of leave-one-corpus-out cross-validation

Train datasets	Test dataset	Name model	UAR, %
RAVDESS, CREMA-D, SAVEE, RAMAS, IEMOCAP	Aff-Wild2	Aff-Wild2	51,6
Aff-Wild2, CREMA-D, SAVEE, RAMAS, IEMOCAP	RAVDESS	RAVDESS	65,8
Aff-Wild2, RAVDESS, SAVEE, RAMAS, IEMOCAP	CREMA-D	CREMA-D	60,6
Aff-Wild2, RAVDESS, CREMA-D, RAMAS, IEMOCAP	SAVEE	SAVEE	76,1
Aff-Wild2, RAVDESS, CREMA-D, SAVEE, IEMOCAP	RAMAS	RAMAS	44,3
Aff-Wild2, RAVDESS, CREMA-D, SAVEE, RAMAS	IEMOCAP	IEMOCAP	25,1

To check the AffectNet validation set, you should run check_valid_set_Affectnet.ipynb.

To get face areas from video, you should run get_face_area.ipynb.

To check dynamic video using one CNN-LSTM model as an example, you should run test_LSTM_RAVDESS.ipynb.

To predict emotions for all videos in your folder, you should run the command python run.py --path_video video/ --path_save report/.

To demonstrate the functioning of our pipeline, we have run it on several videos from the RAVDESS corpus. The output is:

To get new video file with visualization of emotion prediction for each frame, you should run the command python visualization.py. Below are examples of test videos:

Citation

If you are using EMO-AffectNetModel in your research, please consider to cite research paper. Here is an example of BibTeX entry:

@article{RYUMINA2022,
  title        = {In Search of a Robust Facial Expressions Recognition Model: A Large-Scale Visual Cross-Corpus Study},
  author       = {Elena Ryumina and Denis Dresvyanskiy and Alexey Karpov},
  journal      = {Neurocomputing},
  year         = {2022},
  doi          = {10.1016/j.neucom.2022.10.013},
  url          = {https://www.sciencedirect.com/science/article/pii/S0925231222012656},
}

abuuqaasim/EMO-AffectNetModel

In Search of the Robust Facial Expressions Recognition Model: The Visual Cross-Corpus Study

Table. Results (Unweighted average recall, UAR) of leave-one-corpus-out cross-validation

Citation

Links to papers