IDE-3D: Interactive Disentangled Editing for High-Resolution 3D-aware Portrait Synthesis
Jingxiang Sun, Xuan Wang, Yichun Shi, Lizhen Wang, Jue Wang, Yebin Liu
https://mrtornado24.github.io/IDE-3D/
Abstract: Existing 3D-aware facial generation methods face a dilemma in quality versus editability: they either generate editable results in low resolution, or high quality ones with no editing flexibility. In this work, we propose a new approach that brings the best of both worlds together. Our system consists of three major components: (1) a 3D-semantics-aware generative model that produces view-consistent, disentangled face images and semantic masks; (2) a hybrid GAN inversion approach that initialize the latent codes from the semantic and texture encoder, and further optimized them for faithful reconstruction; and (3) a canonical editor that enables efficient manipulation of semantic masks in canonical view and producs high quality editing results. Our approach is competent for many applications, e.g. free-view face drawing, editing and style control. Both quantitative and qualitative results show that our method reaches the state-of-the-art in terms of photorealism, faithfulness and efficiency.
git clone --recursive https://github.com/MrTornado24/IDE-3D.git
cd IDE-3D
conda env create -f environment.yml
Please download our pre-trained checkpoints from link and put them under pretrained_models/
. The link mainly contains the pretrained generator ide3d-ffhq-64-512.pkl
and the style encoder encoder-base-hybrid.pkl
. More pretrianed models will be released soon.
# Generate videos using pre-trained model
python gen_videos.py --outdir=out --trunc=0.7 --seeds=0-3 --grid=2x2 \
--network=pretrained_models/ide3d-ffhq-64-512.pkl --interpolate 1 --image_mode image_seg
# Generate the same 4 seeds in an interpolation sequence
python gen_videos.py --outdir=out --trunc=0.7 --seeds=0-3 --grid=1x1 \
--network=pretrained_models/ide3d-ffhq-64-512.pkl --interpolate 1 --image_mode image_seg
# Generate images using pre-trained model
python gen_images.py --outdir=out --trunc=0.7 --seeds=0-3 \
--network=pretrained_models/ide3d-ffhq-64-512.pkl
# Extract shapes (saved as .mrc and .npy) using pre-trained model
python extract_shapes.py --outdir out --trunc 0.7 --seeds 0-3 \
--network networks/network_snapshot.pkl --cube_size 1
# Render meshes to video
python render_mesh.py --fname out/0.npy --outdir out
We visualize our .mrc shape files with UCSF Chimerax. Please refer to EG3D for detailed instruction of Chimerax.
We provide an interactive tool that can be used for 3D-aware face drawing and editng in real-time. Before using it, please install the enviroment with pip install -r ./Painter/requirements.txt
.
python Painter/run_ui.py
--g_ckpt pretrained_models/ide3d-ffhq-64-512.pkl
--e_ckpt pretrained_models/encoder-base-hybrid.pkl
FFHQ: Download and process the Flickr-Faces-HQ dataset following EG3D. Then, parse semantic masks for all processed images using a pretrained parsing model. You can download dataset.json
for FFHQ here. The processed data would be placed as:
├── /path/to/dataset
│ ├── masks512x512
│ ├── maskscolor512x512
│ ├── images512x512
│ │ ├── 00000
├──img00000000.png
│ │ ├── ...
│ │ ├── dataset.json
Custom dataset: You can process your own dataset using the following commands. It would be useful for real portrait image editing.
cd dataset_preprocessing/ffhq
python preprocess_in_the_wild.py --indir=INPUT_IMAGE_FOLDER
IDE-3D supports 3D-aware real protrait image editing using our interactive tool. Please run the following commands:
# infer latent code as initialization
python apps/infer_hybrid_encoder.py
--target_img /path/to/img_0.png
--g_ckpt pretrained_models/ide3d-ffhq-64-512.pkl
--e_ckpt pretrained_models/encoder-base-hybrid.pkl
--outdir out
The above command would return rec_ws.pt
under out/img_0
.
# run pti
python inversion/scripts/run_pti.py
--run_name ide3d_plus_initial_code
--projector_type ide3d_plus
--pivotal_tuning
--viz_image
--viz_mesh
--viz_video
--label_path /path/to/dataset.json
--image_name img_0
--initial_w out/img_0/rec_ws.pt
We adopt PTI for 3D inverison. Before running, please place the images into examples/
. You can pass Flag ide3d_plus
or ide3d
to choose different inversion types ('w' and 'w+'). Flag initial_w
specifies the latent code obtained from the last step. It benefits more reasonable shape especially for images with steep viewing angles. The command would return pose label label.pt
, reconstructed latent code latent.pt
, finetuned generator and some visualizations.
# (optional) finetune encoder
python apps/finetune_hybrid_encoder.py
--target_img /path/to/img_0.png
--target_code /path/to/latent.pt
--target_label /path/to/label.pt
--g_ckpt /path/to/finetuned_generator.pt
--e_ckpt pretrained_models/encoder-base-hybrid.pkl
--outdir out
--max-steps 1000
This step is to align the shapes reconstructed by encoders and PTI. The finetuned encoder would be saved as finetuned_encoder.pkl
. Besides, a semantic mask mask.png
would be saved under the same folder.
# run UI
python Painter/run_ui.py
--g_ckpt /path/to/finetuned_generator.pt
--e_ckpt /path/to/finetuned_encoder.pkl
--target_code /path/to/latent.pt
--target_label /path/to/label.pt
--inversion
Note you should click Open Image
and load mask.png
that is returned in the last step.
Please obtain the adapted generators following IDE3D-NADA. You can perform interactive editing in other domains by simply replacing the original generator by the adapted one:
python Painter/run_ui.py
--g_ckpt /path/to/adapted_generator.pt
--e_ckpt pretrained_models/encoder-base-hybrid.pkl
IDE-3D supports animating stylized virtual faces through semantic masks. Please process a video clip and prepare a dataset.json
. Then run:
python apps/infer_face_animation.py
--drive_root /path/to/images
--network pretrained_models/ide3d-ffhq-64-512.pkl
--encoder pretrained_models/encoder-base-hybrid.pkl
--grid 4x1
--seeds 52,197,229
--outdir out
Training scipts will be released soon.
Part of the codes are borrowed from StyleGAN3, PTI, EG3D and StyleGAN-nada.
If you use this code for your research, please cite the following works:
@article{sun2022ide,
title = {IDE-3D: Interactive Disentangled Editing for High-Resolution 3D-aware Portrait Synthesis},
author = {Sun, Jingxiang and Wang, Xuan and Shi, Yichun and Wang, Lizhen and Wang, Jue and Liu, Yebin},
journal = {ACM Transactions on Graphics (TOG)},
volume = {41},
number = {6},
articleno = {270},
pages = {1--10},
year = {2022},
publisher = {ACM New York, NY, USA},
doi={10.1145/3550454.3555506},
}
@inproceedings{sun2022fenerf,
title={Fenerf: Face editing in neural radiance fields},
author={Sun, Jingxiang and Wang, Xuan and Zhang, Yong and Li, Xiaoyu and Zhang, Qi and Liu, Yebin and Wang, Jue},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={7672--7682},
year={2022}
}