FDNeRF: Semantics-Driven Face Reconstruction, Prompt Editing and Relighting with Diffusion Models

FDNeRF: Semantics-Driven Face Reconstruction, Prompt Editing and Relighting with Diffusion Models



The ability to create high-quality 3D faces from a single image has become in- creasingly important with wide applications in video conferencing, AR/VR, and advanced video editing in movie industries. In this paper, we propose Face Diffu- sion NeRF(FDNeRF), a new generative method to reconstruct high-quality Face NeRFs from single images, complete with semantic editing and relighting capabili- ties. FDNeRF utilizes high-resolution 3D GAN inversion and expertly trained 2D latent-diffusion model, allowing users to manipulate and construct Face NeRFs in zero-shot learning without the need for explicit 3D data. With carefully designed illumination and identity preserving loss, as well as multi-modal pre-training, FD- NeRF offers users unparalleled control over the editing process enabling them to create and edit face NeRFs using just single-view images, text prompts, and explicit target lighting. The advanced features of FDNeRF have been designed to produce more impressive results than existing 2D editing approaches that rely on 2D segmentation maps for editable attributes. Experiments show that our FDNeRF achieves exceptionally realistic results and unprecedented flexibility in editing compared with state-of-the-art 3D face reconstruction and editing methods


Text-Conditioned 3D Editing on Single Image (include other domians)


Explicit View-consistant 3D Relighting


Text-Condition Generation (include other domians)


Paper & Citation

Link to Paper

If you find this work useful for your research, please cite our paper:

      title={FDNeRF: Semantics-Driven Face Reconstruction, Prompt Editing and Relighting with Diffusion Models}, 
      author={Hao Zhang and Yanbo Xu and Tianyuan Dai and Yu-Wing and Tai Chi-Keung Tang},