StyleFaceV - Official PyTorch Implementation

This repository provides the official PyTorch implementation for the following paper:

StyleFaceV: Face Video Generation via Decomposing and Recomposing Pretrained StyleGAN3
Haonan Qiu, Yuming Jiang, Hang Zhou, Wayne Wu, and Ziwei Liu
Arxiv, 2022.

From MMLab@NTU affliated with S-Lab, Nanyang Technological University and SenseTime Research.

[Project Page] | [Paper] | [Demo Video]

Generated Samples

Updates

[07/2022] Paper and demo video are released.
[07/2022] Code is released.

Installation

Clone this repo:

git clone https://github.com/arthur-qiu/StyleFaceV.git
cd StyleFaceV

Dependencies:

All dependencies for defining the environment are provided in environment/stylefacev.yaml. We recommend using Anaconda to manage the python environment:

conda env create -f ./environment/stylefacev.yaml
conda activate stylefacev

Datasets

Image Data: Unaligned FFHQ

Video Data: RAVDESS

Download the processed video data via this Google Drive or process the data via this repo

Put all the data at the path "../data".

Transform the video data into .png form:

python scripts vid2img.py

Sampling

Pretrained Models

Pretrained models can be downloaded from this Google Drive. Unzip the file and put them under the dataset folder with the following structure:

pretrained_models
├── network-snapshot-005000.pkl  # styleGAN3 checkpoint finetuned on both RAVDNESS and unaligned FFHQ.
├── wing.ckpt                    # Face Alignment model from https://github.com/protossw512/AdaptiveWingLoss.
├── motion_net.pth               # trained motion sampler.
├── pre_net.pth
└── pre_pose_net.pth
checkpoints/stylefacev
├── latest_net_FE.pth            # appearance extractor + recompostion 
├── latest_net_FE_lm.pth         # first half of pose extractor
└── latest_net_FE_pose.pth       # second half of pose extractor

Generating Videos

python test.py --dataroot ../data/actor_align_512_png --name stylefacev \\
    --network_pkl=pretrained_models/network-snapshot-005000.pkl --model sample \\
    --model_names FE,FE_pose,FE_lm --rnn_path pretrained_models/motion_net.pth \\
    --n_frames_G 60 --num_test=64 --results_dir './sample_results/'

Training

Pre Stage

This stage is purely trained on image data and will help the convergence.

python train.py --dataroot ../data/actor_align_512_png --name stylepose \\
    --network_pkl=https://api.ngc.nvidia.com/v2/models/nvidia/research/stylegan3/versions/1/files/stylegan3-r-ffhqu-256x256.pkl \\
    --model stylevpose --n_epochs 5 --n_epochs_decay 5
python train.py --dataroot ../data/actor_align_512_png --name stylefacev_pre \\
    --network_pkl=pretrained_models/network-snapshot-005000.pkl \\
    --model stylepre --pose_path checkpoints/stylevpose/latest_net_FE.pth

You can also use pre_net.pth and pre_pose_net.pth from the folder of pretrained_models.

python train.py --dataroot ../data/actor_align_512_png --name stylefacev_pre \\
    --network_pkl=pretrained_models/network-snapshot-005000.pkl --model stylepre \\
    --pre_path pretrained_models/pre_net.pth --pose_path pretrained_models/pre_pose_net.pth

Decomposing and Recomposing Pipeline

python train.py --dataroot ../data/actor_align_512_png --name stylefacev \\
    --network_pkl=pretrained_models/network-snapshot-005000.pkl --model stylefacevadv \\
    --pose_path pretrained_models/pre_pose_net.pth \\
    --pre_path checkpoints/stylefacev_pre/latest_net_FE.pth \\
    --n_epochs 50 --n_epochs_decay 50 --lr 0.0002

Motion Sampler

python train.py --dataroot ../data/actor_align_512_png --name motion \\
    --network_pkl=pretrained_models/network-snapshot-005000.pkl --model stylernn \\
    --pre_path checkpoints/stylefacev/latest_net_FE.pth \\
    --pose_path checkpoints/stylefacev/latest_net_FE_pose.pth \\
    --lm_path checkpoints/stylefacev/latest_net_FE_lm.pth \\
    --n_frames_G 30

Citation

If you find this work useful for your research, please consider citing our paper:

@misc{https://doi.org/10.48550/arxiv.2208.07862,
  doi = {10.48550/ARXIV.2208.07862},
  url = {https://arxiv.org/abs/2208.07862},
  author = {Qiu, Haonan and Jiang, Yuming and Zhou, Hang and Wu, Wayne and Liu, Ziwei},
  keywords = {Computer Vision and Pattern Recognition (cs.CV), FOS: Computer and information sciences, FOS: Computer and information sciences},
  title = {StyleFaceV: Face Video Generation via Decomposing and Recomposing Pretrained StyleGAN3},
  publisher = {arXiv},
  year = {2022},
  copyright = {arXiv.org perpetual, non-exclusive license}
}

mtk380/StyleFaceV