/Face2FaceRHO

The Official PyTorch Implementation for Face2Face^ρ (ECCV2022)

Primary LanguagePythonBSD 3-Clause "New" or "Revised" LicenseBSD-3-Clause

Face2Faceρ: Official Pytorch Implementation

Environment

  • CUDA 10.2 or above
  • Python 3.8.5
  • pip install -r requirements.txt
    • For visdom, some dependencies may need to be manually downloaded (visdom issue)

Training data

Our framework relies on a large video dataset containing many identities, such as VoxCeleb. For each video frame, the following data is required:

  • image: the cropped face image (please refer to the pre-processing steps of Siarohin et al.)
  • landmark: the 2D facial landmark coordinates obtained via projecting the 3D keypoints on the fitted 3DMM mesh to the image space. (please refer to Section 3.3 of our paper)
  • headpose: 3DMM head pose coefficients
  • face mask (optional): the face area mask (can be generated by any face parsing methods, such as BiSeNet).

The pre-processed data should be organized as follows (an example dataset containing two video sequences is provided in ./dataset/VoxCeleb):

   - dataset
       - <dataset_name>
           - list.txt                             ---list of all videos 
           - id10001#7w0IBEWc9Qw#000993#001143    ---video folder 1 (should be named as <person_id>#<video_id>)
               - img                              ---video frame
                   - 1.jpg
                   - 2.jpg
                   - ...
               - landmark                         ---landmark coordinates for each frame 
                   - 1.txt
                   - 2.txt
                   - ...
               - headpose                         ---head pose coefficients for each frame 
                   - 1.txt
                   - 2.txt
                   - ...
               - mask                             ---face mask for each frame 
                   - 1.png
                   - 2.png
                   - ...
            - id10009#AtavJVP4bCk#012568#012652   ---video folder 2
               ...
                    

Training

  • Set training data
    • Set dataroot in ./src/config/train_face2facerho.ini to ./dataset/<dataset_name> (e.g. dataroot=./dataset/VoxCeleb)
  • Setup vidsom
    • Set visdom <port_number> to display_port in ./src/config/train_face2facerho.ini, and run:
    nohup python -m visdom.server -port <port_number> &
  • Start training (tested with Tesla V100)
    python src/train.py --config ./src/config/train_face2facerho.ini

Testing

  • Prepare models
    • Download our pre-trained models from GooleDrive and put it in ./src/checkpoints/voxceleb_face2facerho
    • Download FLAME model, choose FLAME 2020 and unzip it, copy 'generic_model.pkl' into ./src/external/data
    • Download DECA trained model, and put it in ./src/external/data (no unzip required)
  • Fit the 3DMM coefficients of the source and driving face images
    • Since the code of the 3DMM fitting algorithm used in our paper was taken from our company’s in-house facial performance capture system, we can only release them after getting our company’s official permission. Alternatively, we provide an open-source solution based on DECA. However, the overall performance of our framework may be slightly poorer with DECA. On the one hand, our original 3DMM fitting algorithm is more accurate and stable than DECA; on the other, the pre-configured 72 keypoints are not exactly the same as our original configuration because the mesh templates are different.

Note that the resulting quality may deteriorate by using DECA 3DMM fitting algorithm, since our original 3DMM fitting algorithm is more stable and robust than DECA, and the pre-configured 72 keypoints on the FLAME mesh template are also slightly different from our original configuration.

  • Run (tested with Nvidia GeForce RTX 2080Ti):

    python src/fitting.py --device <"cpu" or "cuda"> \
    --src_img <src_image_file_name> \
    --drv_img <drv_iamge_file_name> \
    --output_src_headpose <output_src_headpose_file_name> \
    --output_src_landmark <output_src_landmark_file_name> \
    --output_drv_headpose <output_drv_headpose_file_name> \
    --output_drv_landmark <output_drv_headpose_file_name>
    • Input
      • device: set device, "cpu" or "cuda"
      • src_img: input source actor image
      • drv_img: input driving actor image
      • output_src_headpose: output head pose coefficients of source image (.txt)
      • output_src_landmark: output facial landmarks of source image (.txt)
      • output_drv_headpose: output head pose coefficients of driving image (.txt)
      • output_drv_landmark: output driving facial landmarks (.txt, reconstructed by using shape coefficients of the source actor and expression and head pose coefficients of the driving actor).
    • Example
      python src/fitting.py --device cuda \
      --src_img ./test_case/source/source.jpg --drv_img ./test_case/driving/driving.jpg \
      --output_src_headpose ./test_case/source/FLAME/headpose.txt --output_src_landmark ./test_case/source/FLAME/landmark.txt \
      --output_drv_headpose ./test_case/driving/FLAME/headpose.txt --output_drv_landmark ./test_case/driving/FLAME/landmark.txt 
  • Get the final reenacted result (tested with Nvidia GeForce RTX 2080Ti):

    python src/reenact.py --config ./config/test_face2facerho.ini \
    --src_img <src_image_file_name> \
    --src_headpose <src_headpose_file_name> \
    --src_landmark <src_landmark_file_name> \
    --drv_headpose <drv_headpose_file_name> \
    --drv_landmark <drv_landmark_file_name> \
    --output_dir <output_dir>
    • Input
      • src_img: input source actor image
      • src_headpose: input head pose coefficients of source image (.txt)
      • src_landmark: input facial landmarks of source image (.txt)
      • drv_headpose: input head pose coefficients of driving image (.txt)
      • drv_landmark: input driving facial landmarks (reconstructed by using shape coefficients of the source actor and expression and head pose coefficients of the driving actor).
      • output_dir: output image (named "result.png") will be saved in this folder.
    • Example
      • Run using 3DMM fitting results by our original 3DMM fitting algorithm (results are pre-save in ./test_case/source/original and ./test_case/source/original)
        python src/reenact.py --config ./src/config/test_face2facerho.ini \
        --src_img ./test_case/source/source.jpg \
        --src_headpose ./test_case/source/original/headpose.txt --src_landmark ./test_case/source/original/landmark.txt \
        --drv_headpose ./test_case/driving/original/headpose.txt --drv_landmark ./test_case/driving/original/landmark.txt \
        --output_dir ./test_case/result
      • Run using 3DMM fitting results by DECA
         python src/reenact.py --config ./src/config/test_face2facerho.ini \
         --src_img ./test_case/source/source.jpg \
         --src_headpose ./test_case/source/FLAME/headpose.txt --src_landmark ./test_case/source/FLAME/landmark.txt \
         --drv_headpose ./test_case/driving/FLAME/headpose.txt --drv_landmark ./test_case/driving/FLAME/landmark.txt \
         --output_dir ./test_case/result