This repo is the official implementation of "MetaPortrait: Identity-Preserving Talking Head Generation with Fast Personalized Adaptation" (CVPR 2023).
By Bowen Zhang*, Chenyang Qi*, Pan Zhang, Bo Zhang, HsiangTao Wu, Dong Chen, Qifeng Chen, Yong Wang and Fang Wen.
Paper | Project Page | Code
In this work, we propose an ID-preserving talking head generation framework, which advances previous methods in two aspects. First, as opposed to interpolating from sparse flow, we claim that dense landmarks are crucial to achieving accurate geometry-aware flow fields. Second, inspired by face-swapping methods, we adaptively fuse the source identity during synthesis, so that the network better preserves the key characteristics of the image portrait. Although the proposed model surpasses prior generation fidelity on established benchmarks, to further make the talking head generation qualified for real usage, personalized fine-tuning is usually needed. However, this process is rather computationally demanding that is unaffordable to standard users. To solve this, we propose a fast adaptation model using a meta-learning approach. The learned model can be adapted to a high-quality personalized model as fast as 30 seconds. Last but not the least, a spatial-temporal enhancement module is proposed to improve the fine details while ensuring temporal coherency. Extensive experiments prove the significant superiority of our approach over the state of the arts in both one-shot and personalized settings.
- Release the inference code of base model and temporal super-resolution model
- Release the training code of base model
- Release the training code of super-resolution model
git clone https://github.com/Meta-Portrait/MetaPortrait.git
cd MetaPortrait
conda env create -f environment.yml
conda activate meta_portrait_base
# if you use GPU that only support cuda11, you may reinstall the torch build with cu11
# pip uninstall torch
# pip install torch==1.12.1+cu116 torchvision==0.13.1+cu116 torchaudio==0.12.1 --extra-index-url https://download.pytorch.org/whl/cu116
Download the checkpoint of base model and put it to base_model/checkpoint
. We provide preprocessed example data for inference, you could download the data, unzip and put it to data
. The directory structure should like this:
data
├── 0
│ ├── imgs
│ │ ├── 00000000.png
│ │ ├── ...
│ ├── ldmks
│ │ ├── 00000000_ldmk.npy
│ │ ├── ...
│ └── thetas
│ ├── 00000000_theta.npy
│ ├── ...
├── src_0_id.npy
├── src_0_ldmk.npy
├── src_0.png
├── src_0_theta.npy
└── src_map_dict.pkl
You could generate results of self reconstruction on 256x256 resolution by running:
cd base_model
python inference.py --save_dir result --config config/meta_portrait_256_eval.yaml --ckpt checkpoint/ckpt_base.pth.tar
Train the warping network first using the following command:
cd base_model
python main.py --config config/meta_portrait_256_pretrain_warp.yaml --fp16 --stage Warp --task Pretrain
Then, modify the path of warp_ckpt
in config/meta_portrait_256_pretrain_full.yaml
and joint train the warping network and refinement network using the following command:
python main.py --config config/meta_portrait_256_pretrain_full.yaml --fp16 --stage Full --task Pretrain
You could start from the standard pretrained checkpoint and further optimize the personalized adaptation speed of the model by utilizing meta-learning using the following command:
python main.py --config config/meta_portrait_256_pretrain_meta_train.yaml --fp16 --stage Full --task Meta --remove_sn --ckpt /path/to/standard_pretrain_ckpt
Set the root path to sr_model
Download the checkpoint using the bash command
cd sr_model
bash download_sr.sh
Unzip the package and keep the file structure like
pretrained_ckpt
├── temporal_gfpgan.pth
├── GFPGANv1.3.pth
...
data
├── HDTF_warprefine
│ ├── gt
│ ├── lq
│ ├── ...
Basicsr
Experimental_root
options
# Install a modified basicsr - https://github.com/xinntao/BasicSR
cd Basicsr
pip install -r requirements.txt
python setup.py develop
# Install facexlib - https://github.com/xinntao/facexlib
# We use face detection and face restoration helper in the facexlib package
pip install facexlib
cd ..
pip install -r requirements.txt
# python setup.py develop
ckpt for inference: pretrained_ckpt/temporal_gfpgan.pth
Enhance the result from our base model without calculating the metrics:
python -m torch.distributed.launch --nproc_per_node=1 --master_port=4321 Experimental_root/test.py -opt options/test/same_id_demo.yml --launcher pytorch
You may adjust the nproc_per_node
to the number of GPUs on your own machine.
Finally, check the result at results/temporal_gfpgan_same_id_temporal_super_resolution
.
In the paper result, we train on the training split of hdtf dataset. Here we first provide a demo training code to train on the small demo dataset
CUDA_VISIBLE_DEVICES=1 python -m torch.distributed.launch --nproc_per_node=1 --master_port=4321 Experimental_root/train.py -opt options/train/train_sr_hdtf.yml --launcher pytorch
The intermediate result can be check at /home/cqiaa/talkinghead/MetaPortrait/sr_model/experiments/train_sr_hdtf/visualization/00000001/hdtf_random
@misc{zhang2022metaportrait,
title={MetaPortrait: Identity-Preserving Talking Head Generation with Fast Personalized Adaptation},
author={Bowen Zhang and Chenyang Qi and Pan Zhang and Bo Zhang and HsiangTao Wu and Dong Chen and Qifeng Chen and Yong Wang and Fang Wen},
year={2022},
eprint={2212.08062},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
This code borrows heavily from FOMM, thanks the authors for sharing their code and models.
This is the codebase for our research work. Please open a GitHub issue for any help. If you have any questions regarding the technical details, feel free to contact zhangbowen@mail.ustc.edu.cn or cqiaa@connect.ust.hk.