E³Gen: Efficient, Expressive and Editable Avatars Generation

Arxiv | Project Page

Official PyTorch implementation of paper: E³Gen: Efficient, Expressive and Editable Avatars Generation.

Getting Started

Prerequisites

The code has been tested in the environment described as follows:

Linux (tested on Ubuntu 20.04 LTS)
Python 3.7
CUDA Toolkit 11.3
PyTorch 1.12.1
MMCV 1.6.0
MMGeneration 0.7.2

Installation

Set up a conda environment as follows:

# Export the PATH of CUDA toolkit
export PATH=/usr/local/cuda-11.3/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda-11.3/lib64:$LD_LIBRARY_PATH

# Create conda environment
conda create -y -n e3gen python=3.7
conda activate e3gen

# Install PyTorch (this script is for CUDA 11.3)
pip install torch==1.12.1+cu113 torchvision==0.13.1+cu113 --extra-index-url https://download.pytorch.org/whl/cu113

# Install MMCV and MMGeneration
pip install -U openmim
mim install mmcv-full==1.6
git clone https://github.com/open-mmlab/mmgeneration && cd mmgeneration && git checkout v0.7.2
pip install -v -e .
cd ..

# Clone this repo and install other dependencies
git clone <this repo> && cd <repo folder>
pip install -r requirements.txt

# Install gaussian-splatting
git clone https://github.com/graphdeco-inria/gaussian-splatting --recursive
cd gaussian-splatting/submodules/diff-gaussian-rasterization
python setup.py develop
cd ../simple-knn
python setup.py develop
cd ../../../

# Install dependencies for deformation module
python setup.py develop

# Install pytorch3d
wget https://anaconda.org/pytorch3d/pytorch3d/0.7.1/download/linux-64/pytorch3d-0.7.1-py37_cu113_pyt1121.tar.bz2
conda install --use-local pytorch3d-0.7.1-py37_cu113_pyt1121.tar.bz2

Download the SMPLX model and related files for avatar representation template and gaussian initialization.

(Recommend) You can run the following command to automatically download all these files.

Before running, please remember to register on the SMPL-X website and FLAME website.

bash scripts/fetch_template.sh

After downloading, the structure should look like this:

.
├── assets
├── ...
├── lib
│   ├── models
│       ├── deformers
│           ├── smplx
│               ├── SMPLX
│                   ├── models
│                       ├── smplx
│                           ├── SMPLX_FEMALE.npz
│                           ├── SMPLX_FEMALE.pkl
│                           ├── SMPLX_MALE.npz
│                           ├── SMPLX_MALE.pkl
│                           ├── SMPLX_NEUTRAL.npz
│                           ├── SMPLX_NEUTRAL.pkl
│                           ├── smplx_npz.zip
│                           └── version.txt
└── work_dirs
    ├── cache
        ├── template
            ├── FLAME_masks.pkl
            ├── head_template_mesh_mouth.obj
            ├── head_template.obj
            ├── SMPL-X__FLAME_vertex_ids.npy
            ├── smplx_uv.obj
            └── smplx_vert_segmentation.json

(You can also download them manually and place them in the correct folders.

Put the following files in the work_dirs/cache/template folder.

SMPL-X segmentation file(smplx_vert_segmentation.json)
SMPL-X UV(smplx_uv.obj)
SMPL-X FLAME Correspondence(SMPL-X__FLAME_vertex_ids.npy)
FLAME with mouth Mesh Template(head_template_mesh_mouth.obj)
FLAME Mesh Template(head_template.obj)
FLAME Mask(FLAME_masks.pkl)

Put the SMPL-X model (models_smplx_v1_1.zip) in lib/models/deformers/smplx/SMPLX/)

Extract avatar representation template from downloaded files:

cd lib/models/deformers

# preprocess for uv, obtain new uv for smplx_mouth.obj
python preprocess_smplx.py

# save subdivide smplx mesh and corresponding uv
python subdivide_smplx.py

# save parameters for init
python utils_smplx.py
python utils_uvpos.py

(Optional, for training and local editing process)Download the Pretrained VGG for perceptual loss calculation, and put the files to work_dirs/cache/vgg16.pt.

Data preparation

Download THUman2.0 Dataset and its corresponding SMPL-X fitting parameters from here. Unzip them to ./data/THuman.
Render the RGB image with ICON.

We made some modifications to the ICON rendering part, so please install our version:

git clone https://github.com/olivia23333/ICON

cd ICON
git checkout e3gen
conda create -n icon python=3.8
conda activate icon
conda install pytorch==1.12.1 torchvision==0.13.1 cudatoolkit=11.3 -c pytorch
conda install -c fvcore -c iopath -c conda-forge fvcore iopath
conda install -c bottler -c conda-forge nvidiacub pyembree
conda install pytorch3d -c pytorch3d
pip install -r requirements.txt --use-deprecated=legacy-resolver
git clone https://github.com/YuliangXiu/rembg
cd rembg 
pip install -e .
cd ..

bash fetch_data.sh

After the installation, run

# rendering 54 views for each scan
bash scripts/render_thuman.sh

If scripts/render_thuman.sh is stuck at the mesh.ray.intersects_any function, you can refer to this issue.

Finally, run the following commands:

cd ..
# change rendered images into training dataset format
python reorganize.py
python split.py

# generate test cache, we use configs/ssdnerf_avatar_uncond_thuman_conv_16bit.py here
conda deactivate
conda activate e3gen
CUDA_VISIBLE_DEVICES=0 python tools/inception_stat.py /PATH/TO/CONFIG

The final structure of the training dataset is as follows:

data
└── humanscan_wbg
    ├── human_train
        ├── 0000
            ├── pose    # camera parameter
            ├── rgb     # rendered images
            ├── smplx   # smplx parameter
        ├── ...
        ├── 0525
    ├── human_test
    └── human_train_cache.pkl

Training

Run the following command to train a model:

# For /PATH/TO/CONFIG, we use configs/ssdnerf_avatar_uncond_thuman_conv_16bit.py here
python train.py /PATH/TO/CONFIG --gpu-ids 0 1

Our model is trained using 2 RTX 3090 (24G) GPUs.

Model checkpoints will be saved into ./work_dirs. UV features plane for scans will be saved into ./cache.

Inference

# For /PATH/TO/CONFIG, we use configs/ssdnerf_avatar_uncond_thuman_conv_16bit.py here
python test.py /PATH/TO/CONFIG /PATH/TO/CHECKPOINT --gpu-ids 0 1

The trained model can be downloaded from here for testing.

Codes for editing and novel pose animation will be updated soon.

Acknowledgements

This project is built upon many amazing works:

SSDNeRF for Base Diffusion Backbone
gaussian-splatting
AG3D for deformation module
ICON, NHA, MVP, TADA, DECA and PointAvatar for data preprocessing
StyleGAN2-ADA for perceptual loss

Citation

@article{zhang2024e3gen,
    title={$E^{3}$Gen: Efficient, Expressive and Editable Avatars Generation}, 
    author={Weitian Zhang and Yichao Yan and Yunhui Liu and Xingdong Sheng and Xiaokang Yang},
    year={2024},
    journal={arXiv preprint arXiv:2405.19203},
}

olivia23333/E3Gen

E3Gen: Efficient, Expressive and Editable Avatars Generation