DiffusionAvatars: Deferred Diffusion for High-fidelity 3D Head Avatars

Tobias Kirschstein, Simon Giebenhain and Matthias Nießner
CVPR 2024

1. Installation

1.1. Dependencies

Setup environment
```
conda env create -f environment.yml
conda activate diffusion-avatars
```
which creates a new conda environment diffusion-avatars (Installation may take a while).
Install the diffusion_avatars package itself by running
```
pip install -e .
```
inside the cloned repository folder.
[Optional Linux] Update LD_LIBRARY_PATH for nvidffrast
```
ln -s "$CONDA_PREFIX/lib" "$CONDA_PREFIX/lib64"
```
Solves the issue /usr/bin/ld: cannot find -lcudart
[Optional Windows] Update CUDA_HOME for nvidiffrast
```
conda env config vars set CUDA_HOME=$Env:CONDA_PREFIX
conda activate base
conda activate diffusion-avatars
```
Solves the issue when nvidffrast wants to use a globally installed CUDA toolkit instead of the one from the environment.

1.2. Environment Paths

All paths to data / models / renderings are defined by environment variables.
Please create a file in your home directory in ~/.config/diffusion-avatars/.env with the following content:

DIFFUSION_AVATARS_DATA_PATH="..."
DIFFUSION_AVATARS_MODELS_PATH="..."
DIFFUSION_AVATARS_RENDERS_PATH="..."

Replace the ... with the locations where data / models / renderings should be located on your machine.

DIFFUSION_AVATARS_DATA_PATH: Location of the multi-view videos and preprocessed 3DMM meshes (See section 2 for how to obtain the dataset)
DIFFUSION_AVATARS_MODELS_PATH: During training, model checkpoints and configs will be saved here
DIFFUSION_AVATARS_RENDERS_PATH: Video renderings of trained models will be stored here

If you do not like creating a config file in your home directory, you can instead hard-code the paths in the env.py.

2. Data

Data as well as model checkpoints can be found in the Downloads section.

The folder structure assumed by the code looks as follows:

DIFFUSION_AVATARS_DATA_PATH
├── nersemble       # Raw NeRSemble data containing RGB images, etc
│   ├── 018           # Data folder for participant 18  
│   ├── 037
│   ...
└── rendering_data  # Rasterized NPHM meshes that are the input for DiffusionAvatars 
    ├── v1.1-ID-18-nphm   # Dataset of rasterized NPHM meshes for participant 18
    ├── v1.2-ID-37-nphm
    ...

DIFFUSION_AVATARS_MODELS_PATH
└── diffusion-avatars     # Folder for DiffusionAvatars checkpoints
    ├── DA-1-ID-18           # Checkpoint for DiffusionAvatars model trained on participant 18
    ├── DA-2-ID-37
    ...

3. Usage

3.1. Training

python scripts/train/train_diffusion_avatars.py $DATASET

where $DATASET is the identifier of a dataset folder in ${DIFFUSION_AVATARS_DATA_PATH}/rendering_data.
E.g., v1.1 will train DiffusionAvatars for data of person 37 stored in the folder v1.1-ID-37-nphm.
Please find the respective raw NeRSemble data as well as processed datasets in the Downloads section.

Checkpoints and train configurations will be stored in a model folder DA-xxx inside ${DIFFUSION_AVATARS_MODELS_PATH}/diffusion-avatars/DA-xxx-${name}. The incremental run id xxx will be automatically determined.

During training, the script will log metrics and images to weights and biases (wandb.ai) to a project diffusion-avatars. The hold out test sequences are specified in constants.py.

Memory consumption

Training takes roughly 2 days and requires at least an RTX A6000 GPU (48GB VRAM). For debugging purposes, the following flags may be used to keep the GPU memory consumption below 10G:

--batch_size 1 --gradient_accumulation 4 --use_8_bit_adam --mixed_precision FP16 --dataloader_num_workers 0

3.2. Rendering

From a trained model DA-xxx, a self-reenactment rendering may be obtained via:

python scripts/render/render_trajectory.py DA-xxx

The resulting .mp4 file is stored in DIFFUSION_AVATARS_RENDERS_PATH.
Please find trained model checkpoints and corresponding raw NeRSemble data in the Downloads section.

3.3. Evaluation

python scripts/evaluate/evaluate.py DA-xxx

will evaluate the self-reenactment scenario for avatar DA-xxx. Please find trained model checkpoints and corresponding processed datasets in the Downloads section. The computed metrics and generated model predictions with paired GT images will be stored in ${DIFFUSION_AVATARS_MODELS_PATH}/diffusion-avatars/DA-xxx-${name}/evaluations.
The key "average_per_sequence_metric" in the generated .json file reproduces the metrics from the paper.

3.4. Create custom datasets

The script

python scripts/data/create_renderings_dataset.py $PARTICIPANT_ID $SEQUENCES

processes the raw NeRSemble data of $PARTICIPANT_ID for the comma-separated $SEQUENCES. It creates a new folder in ${DIFFUSION_AVATARS_DATA_PATH}/rendering_data that contains the rasterized NPHM images and forms the input for training DiffusionAvatars. Please refer to the provided datasets in the Downloads section for the expected folder layout for the raw NeRSemble data. The NPHM fittings where obtained using MonoNPHM.

3.5. Example Notebooks

The notebooks folder contains minimal examples on how to

Load RGB images and NPHM renderings (visualize_data.ipynb)
Load a trained model and obtain a prediction (inference.ipynb)

4. Downloads

Participant ID	Model	Raw NeRSemble data	Processed Data
18	DA-1-ID-18	ID-18	v1.1-ID-18-nphm
37	DA-2-ID-37	ID-37	v1.2-ID-37-nphm
55	DA-3-ID-55	ID-55	v1.3-ID-55-nphm
124	DA-4-ID-124	ID-124	v1.4-ID-124-nphm
145	DA-5-ID-145	ID-145	v1.5-ID-145-nphm
210	DA-6-ID-210	ID-210	v1.6-ID-210-nphm
251	DA-7-ID-251	ID-251	v1.7-ID-251-nphm
264	DA-8-ID-264	ID-264	v1.8-ID-264-nphm

Participant ID refers to the participants from the NeRSemble dataset. The model zip files contain a checkpoint as well as hyperparameters. The raw NeRSemble data files contain the RGB images, segmentation masks, foreground masks, NPHM fittings (obtained with MonoNPHM), and camera parameters. The processed data archives contain the rasterized NPHM meshes (normals, depth, and canonical coordinates) that serve as the input for DiffusionAvatars.
Before using the raw NeRSemble data or processed data for your own projects, please fill out the NeRSemble dataset Terms of Use.

If you find our paper or code useful, please consider citing

@inproceedings{kirschstein2024diffusionavatars,
  title={DiffusionAvatars: Deferred Diffusion for High-fidelity 3D Head Avatars},
  author={Kirschstein, Tobias and Giebenhain, Simon and Nie{\ss}ner, Matthias},
  booktitle={Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR)},
  year={2024}
}

Contact Tobias Kirschstein for questions, comments and reporting bugs, or open a GitHub issue.

Madankh/Need-to-learn-diffusion-avatars