This is the official repository of our CVPR 2020 paper DeepFaceFlow.
Mohammad Rami Koujan 1,4,
Anastasios Roussos 1,2,4,
Stefanos Zafeiriou 3,4
1 University of Exeter
2 Foundation for Research and Technology - Hellas (FORTH), Greece
3 Imperial College London
4 FaceSoft.io
Dense 3D facial motion capture from only monocular in-the-wild pairs of RGB images is a highly challenging problem with numerous applications, ranging from facial expression recognition to facial reenactment. In this work, we propose DeepFaceFlow, a robust, fast, and highly-accurate framework for the dense estimation of 3D non-rigid facial flow between pairs of monocular images. Our DeepFaceFlow framework was trained and tested on two very large-scale facial video datasets, one of them of our own collection and annotation, with the aid of occlusion-aware and 3D-based loss function. We conduct comprehensive experiments probing different aspects of our approach and demonstrating its improved performance against state-of-the-art flow and 3D reconstruction methods. Furthermore, we incorporate our framework in a full-head state-of-the-art facial video synthesis method and demonstrate the ability of our method in better representing and capturing the facial dynamics, resulting in a highly-realistic facial video synthesis. Given registered pairs of images, our framework generates 3D flow maps at ~ 60 fps.
Our overall designed framework is demonstrated above. We expect as input two RGB images I_1, I_2 and produce at the output an image F encoding the per-pixel 3D optical flow from I_1 to I_2. The designed framework is marked by two main stages: 1) 3DMeshReg: 3D shape initialisation and encoding of the reference frame I_1, 2) DeepFaceFlowNet (DFFNet): 3D face flow prediction. The entire framework was trained in a supervised manner, utilising the collected and annotated dataset, see our paper for more details, and fine-tuned on the 4DFAB dataset, after registering the sequence of scans coming from each video in this dataset to our 3D template. Input frames were registered to a 2D template of size 224X224 with the help of the 68 mark-up and fed to our framework.
Details about our collected Face3Dvid dataset will be published soon.
If you find our work useful, please cite it as follows:
@InProceedings{Koujan_2020_CVPR,
author = {Koujan, Mohammad Rami and Roussos, Anastasios and Zafeiriou, Stefanos},
title = {DeepFaceFlow: In-the-Wild Dense 3D Facial Motion Estimation},
booktitle = {The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2020}
}