/face-vid2vid

Unofficial implementation of the paper "One-Shot Free-View Neural Talking-Head Synthesis for Video Conferencing" (CVPR 2021 Oral)

Primary LanguagePython

face-vid2vid

Usage

Dataset Preparation

cd datasets
wget https://yt-dl.org/downloads/latest/youtube-dl -O youtube-dl
chmod a+rx youtube-dl
python load_videos.py --workers=8
cd ..

Pretrained Headpose Estimator

300W-LP, alpha 1, robust to image quality

Put hopenet_robust_alpha1.pkl under the repo root

Train

python train.py --batch_size=4 --gpu_ids=0,1,2,3 --num_epochs=100 (--ckp=10)

On 2080Ti, setting batch_size=4 makes up gpu memory

Evaluate

Reconstruction:

python evaluate.py --ckp=99 --source=r --driving=datasets/vox/test/id10280#NXjT3732Ekg#001093#001192.mp4

The first frame is used as source by default

Motion transfer:

python evaluate.py --ckp=99 --source=test.png --driving=datasets/vox/test/id10280#NXjT3732Ekg#001093#001192.mp4

Example after training for 7 days on 4 2080Ti:

show

Face Frontalization:

python evaluate.py --ckp=99 --source=f --driving=datasets/vox/train/id10192#S5yV10aCP7A#003200#003334.mp4

Acknowlegement

Thanks to NV, Imaginaire, AliaksandrSiarohin and DeepHeadPose