/TCAE

Self-supervised Representation Learning from Videos for Facial Action Unit Detection

Primary LanguagePython

Self-supervised Representation Learning from Videos for Facial Action Unit Detection, CVPR 2019 (oral)

We propose a Twin-Cycle Autoencoder (TCAE) that self-supervisedly learns two embeddings to encode the movements of Facial Actions and Head Motions.

Given a source and target facial images, TCAE is tasked to change the AUs or head poses of the source frame to those of the target frame by predicting the AU-related and pose-related movements, respectively.

The generated AU-changed and pose-changed faces are shown as below:

Please refer to the original paper and supplementary file for more examples.

Prerequisites

  • Python 2.x
  • Pytorch 0.4.1

Training yourself

  • Download the training dataset: Voxceleb1/2
  • Extract the frames at 1fps, then detect & align the faces, organize the face directories in format: data/id09238/VqEJCd7pbgQ/0_folder/img_0_001.jpg
  • Split the face images for training/validation/testing by dataset_split.py
  • Train TCAE by self_supervised_train_TCAE.py

The learned AU embedding from TCAE can be used for both AU detection and facial image retrieval.

Pretrained model released !

  • Download pretrained model and script at: onedrive

Frequently asked questions ...

  • If you adopt the released model, you should align the facial images in AU datasets by Seetaface.
  • If you train the model yourself, the facial images in voxceleb1/2 should be detected & aligned & cropped, as done in face recognition task.

If you use this code in your paper, please cite the following:

@InProceedings{Li_2019_CVPR,
author = {Li, Yong and Zeng, Jiabei and Shan, Shiguang and Chen, Xilin},
title = {Self-Supervised Representation Learning From Videos for Facial Action Unit Detection},
booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2019}
}