This is the repository for the face depth regressor implementation. Given a single-view input image, the neural network regressor, predicts dense depth pixel values, and achieves 3D reconstruction of the entire face. The model is trained on synthetic EG3D generated data. The model is trained for different neural network architectures. The 2 architectures are the Attention-UNET and the MiDaS-DPT Vision Transformer. The MiDaS architecture recovers mostly coarse face details and focuses on correctly predicting dense depth values that recover the general face structure, rather than fine-grained details of the face region(e.g wrinkles, nose, cheeks, lips). Our best model, is the Attention-UNET, that can reconstruct better high-frequency details around the face region, resulting to better overall 3D reconstruction, from the predicted depthmaps. Overall, our models, predict dense depth pixels, for a given input image and then using the depth2mesh algorithm, we recover the 3D face reconstruction as a trianglular mesh (as a .obj file), directly from the predicted depthmap (.png). The code for evaluating our existing models, as well as training/fine-tuning models with custom data, can be found in the following notebooks.
Below you can find the relavant notebooks:
Description | Link |
---|---|
⭐ Attention-UNET depth regressor training, testing, evaluation | |
🥈 MiDaS (DPT Vision Transformer) depth regressor training, testing, evaluation |
The notebooks above, include step by step instructions and interactive ways to guide anyone to rapidly set up, train and generally play around with these models.
Qualitative results for the best model, are shown below. Initially, we present the qualitative performance of the model, in synthetic face data.
syntetic_data_results.mp4
Here we can see the qualitative results on real in-the-wild images.
real_data_results.mp4
A good comparison, to our work, is the UnSup3D paper, that leverages depth regression to reconstruct 3D faces in an unsupervised manner. This repository levarages data generated by a model called EG3D and uses the codebase from EG3D. Please refer to the project's webpage for more information. For more information about our synthetic training dataset, please refer to our detailed repository here