NeRFInvertor: High Fidelity NeRF-GAN Inversion for Single-shot Real Image Animation, CVPR'23

teaser.mov

This is an official pytorch implementation of our NeRFInvertor paper:

Y. Yin, K. Ghasedi, H. Wu, J. Yang, X. Tong, Y. Fu, NeRFInvertor: High Fidelity NeRF-GAN Inversion for Single-shot Real Image Animation, IEEE Computer Vision and Pattern Recognition (CVPR), 2023.

[Paper] [ArXiv] [Project Page]

Abstract: Nerf-based Generative models (NeRF-GANs) have shown impressive capacity in generating high-quality images with consistent 3D geometry. In this paper, we propose a universal method to surgically fine-tune these NeRF-GANs in order to achieve high-fidelity animation of real subjects only by a single image. Given the optimized latent code for an out-of-domain real image, we employ 2D loss functions on the rendered image to reduce the identity gap. Furthermore, our method leverages explicit and implicit 3D regularizations using the in-domain neighborhood samples around the optimized latent code to remove geometrical and visual artifacts.

Recent Updates

2023.06.01: Inversion of GRAM

TODO:

Inversion of EG3D
Inversion of AnifaceGAN

Requirements

Currently only Linux is supported.
64-bit Python 3.8 installation or newer. We recommend using Anaconda3.
One or more high-end NVIDIA GPUs, NVIDIA drivers, and CUDA toolkit 10.1 or newer. We recommend using Tesla V100 GPUs with 32 GB memory for training to reproduce the results in the paper.

Installation

Clone the repository and set up a conda environment with all dependencies as follows:

git clone https://github.com/YuYin1/NeRFInvertor.git
cd NeRFInvertor
conda env create -f environment.yml
source activate nerfinvertor

Preparation

We provide various auxiliary models needed for NeRF-GAN inversion task. This includes the NeRF-based generators and pre-trained models used for loss computation.

Pretrained NeRF-GANs

Model	Dataset	Resolution	Download
GRAM	FFHQ	256x256	Github link
GRAM	Cats	256x256	Github link
EG3D	FFHQ	256x256	Github link
AnifaceGAN	FFHQ	512x512	Github link
arcface	--	--	Github link

Models are summarized at Github link.

Prepare Dataset

Sample dataset: We provide some sample images.

NeRFInvertor/
│
└─── samples/
	│
	└─── faces/
		│
		└─── *.png   # original 256x256 images
		|
		└─── poses/  # estimated face poses
			|
			└─── *.mat   
		│
		└─── mask256/   # mask of faces
			|
			└─── *.png

FFHQ or CelebA-HQ: We additionally provide FFHQ (google drive) and CelebA-HQ (google drive) datasets for training and evaluation. The dataset includes face images, masks, and face poses. Noted that the face poses is estimated by Deep3DFaceRecon. The datasets have the following structure:

datasets/
│
└─── ffhq/
	│
	└─── *.png   # original 256x256 images
	|
	└─── poses/  # estimated face poses
		|
		└─── *.mat   
	│
	└─── mask256/   # mask of faces
		|
		└─── *.png
│
└─── celebahq/
    ...

Pretrained NeRFInvertor for sample images

We provide pretrained NeRFInvertor (i.e., fine-tuned models) for each samples. The folder includes optimized latent codes, fine-tuned models, and inference results (i.e., rendering outputs).

Inversion

Optimize latent codes

In order to invert a real image and edit it you should first align and crop it to the correct size. Use --name=image_name.png to invert a specific image, otherwise, the following commond will invert all images in img_dir

python optimization.py \
    --generator_file='pretrained_models/gram/FFHQ_default/generator.pth' \
    --output_dir='experiments/gram/optimization' \
    --data_img_dir='samples/faces/' \
    --data_pose_dir='samples/faces/poses/' \
    --config='FACES_default' \
    --max_iter=1000

Finetune NeRFGANs

CUDA_VISIBLE_DEVICES=0,1 python finetune.py \
    --target_names='R1.png+R2.png' \
    --config='FACES_finetune' \
    --output_dir='experiments/gram/finetuned_model/' \
    --data_img_dir='samples/faces/' \
    --data_pose_dir='samples/faces/poses/'  \
    --data_emd_dir='experiments/gram/optimization/' \
    --pretrain_model='pretrained_models/gram/FFHQ_default/generator.pth' \
    --load_mask \
    --regulizer_alpha=5 \
    --lambda_id=0.1 \
    --lambda_reg_rgbBefAggregation 10 \
    --lambda_bg_sigma 10

Inference

Rendering results for finetuned models

CUDA_VISIBLE_DEVICES=0 python rendering_using_finetuned_model.py \
    --generator_file='experiments/gram/finetuned_model/000990/generator.pth' \
    --target_name='000990' \
    --output_dir='experiments/gram/rendering_results/' \
    --data_img_dir='samples/faces/' \
    --data_pose_dir='samples/faces/poses/'  \
    --data_emd_dir='experiments/gram/optimization/' \
    --config='FACES_finetune' \
    --image_size 256 \
    --gen_video

Acknowledgements

This repository structure is based on GRAM and PTI repositories. We thank the authors for their excellent work.

Contact

If you have any questions, please contact Yu Yin (yin.yu1@northeastern.edu).

Citation

@inproceedings{yin2023nerfinvertor,
  title={NeRFInvertor: High Fidelity NeRF-GAN Inversion for Single-shot Real Image Animation},
  author={Yin, Yu and Ghasedi, Kamran and Wu, HsiangTao and Yang, Jiaolong and Tong, Xin and Fu, Yun},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={8539--8548},
  year={2023}
}