StyleGAN-Human: A Data-Centric Odyssey of Human Generation

Abstract: Unconditional human image generation is an important task in vision and graphics, which enables various applications in creative industry. Existing studies in this field mainly focus on “network engineering” such as designing new components and objective functions. In this work, we take a data-centric perspective and investigate multiple critical aspects in “data engineering”, which we believe would complement the current practice. To facilitate a comprehensive study, we collect and annotate a large-scale human image dataset with diverse poses and textures. Equipped with this large dataset, we rigorously investigate three essential factors in data preparation for human generation, namely data size, data distribution, and data alignment. Extensive experiments reveal several valuable observations w.r.t. these aspects: 1) A large-scale data, more than 40K images, are needed to train a high-fidelity unconditional human generation model with vanilla StyleGAN. 2) A balanced training set helps improve the quality of human generation with rare faces and clothing textures compared to the long-tailed counterpart. 3) Human GAN models that use body centers for alignment outperform models trained using face or torso centers as alignment anchors. In addition, a model zoo and editing applications are demonstrated to facilitate future research in community.
Keyword: Human Image Generation, Data-Centric, StyleGAN

[Demo Video] | [Project Page] | [Paper]

Model Zoo

Structure	1024x512	512x256
StyleGAN1	stylegan_human_v1_1024.pkl	to be released
StyleGAN2	stylegan_human_v2_1024.pkl	stylegan_human_v2_512.pkl
StyleGAN3	to be released	stylegan_human_v3_512.pkl

Web Demo

To help users get started, we prepare a Jupyter notebook here. This allows users to inference images with the provided models, as well as visualize the performance of style-mixing, interpolation, and attributes editing. The notebook will guide user to install the necessary environment and download pretrained models. The output images can be found in ./StyleGAN-Human/outputs/.

Usage

System requirements

The original code based are derived from stylegan (tensorflow), stylegan2-ada (pytorch), stylegan3 (pytorch), released by NVidia

We tested in Python 3.8.5 and PyTorch 1.9.1 with CUDA 11.1 as well as Pytorch 1.7.1 with CUDA 10.1. (See https://pytorch.org for PyTorch install instructions.)

Installation

To work with this project on your own machine, we also provide the integrated environment installation instructions:

conda env create -f environment.yml
conda activate stylehuman
# [Optional: tensorflow 1.x is required for StyleGAN1. ]
pip install nvidia-pyindex
pip install nvidia-tensorflow[horovod]
pip install nvidia-tensorboard==1.15

Extra notes:

In case having some conflicts when calling CUDA version, please try to empty the LD_LIBRARY_PATH. For example:

LD_LIBRARY_PATH=; python generate.py --outdir=out/stylegan_human_v2_1024 --trunc=1 --seeds=1,3,5,7 
--network=pretrained_models/stylegan_human_v2_1024.pkl --version 2

We found the following troubleshooting links might be helpful: 1., 2.

Pretrained models

Please put the downloaded pretrained models from above link under the folder 'pretrained_models'.

Generate full-body human images using our pretrained model

# Generate human full-body images without truncation
python generate.py --outdir=outputs/generate/stylegan_human_v2_1024 --trunc=1 --seeds=1,3,5,7 --network=pretrained_models/stylegan_human_v2_1024.pkl --version 2

# Generate human full-body images with truncation 
python generate.py --outdir=outputs/generate/stylegan_human_v2_1024 --trunc=0.8 --seeds=0-10 --network=pretrained_models/stylegan_human_v2_1024.pkl --version 2

# Generate human full-body images using stylegan V1
python generate.py --outdir=outputs/generate/stylegan_human_v1_1024 --network=pretrained_models/stylegan_human_v1_1024.pkl --version 1 --seeds=1,3,5

Interpolation

python interpolation.py --network=pretrained_models/stylegan_human_v2_1024.pkl  --seeds=85,100 --outdir=outputs/inter_gifs

Style-mixing image using stylegan2

python style_mixing.py --network=pretrained_models/stylegan_human_v2_1024.pkl --rows=85,100,75,458,1500 \\
    --cols=55,821,1789,293 --styles=0-3 --outdir=outputs/stylemixing

Style-mixing video using stylegan2

python stylemixing_video.py --network=pretrained_models/stylegan_human_v2_1024.pkl --row-seed=3859 \\
    --col-seeds=3098,31759,3791 --col-styles=8-12 --trunc=0.8 --outdir=outputs/stylemixing_video

Editing with InterfaceGAN, StyleSpace, and Sefa

python edit.py --network pretrained_models/stylegan_human_v2_1024.pkl --attr_name upper_length \\
    --seeds 61531,61570,61571,61610 --outdir outputs/edit_results

Note:

''upper_length'' and ''bottom_length'' of ''attr_name'' are available for demo.
Layers to control and editing strength are set in edit/edit_config.py.

Demo for InsetGAN

We implement a quick demo using the key idea from InsetGAN: combining the face generated by FFHQ with the human-body generated by our pretrained model, optimizing both face and body latent codes to get a coherent full-body image. Before running the script, you need to download the FFHQ face model, or you can use your own face model, as well as pretrained face landmark and pretrained CNN face detection model for dlib

python insetgan.py --body_network=pretrained_models/stylegan_human_v2_1024.pkl --face_network=pretrained_models/ffhq.pkl \\
    --body_seed=82 --face_seed=43  --trunc=0.6 --outdir=outputs/insetgan/ --video 1

Results

Editing

Edit upper length (StyleSpace)	Edit bottom length (StyleSpace)	Edit upper length (InterFaceGAN)	Edit bottom length (InterFaceGAN)

For more demo, please visit our web page .

TODO List

Release Dataset
Release 1024x512 version of StyleGAN-Human based on StyleGAN3
Release 512x256 version of StyleGAN-Human based on StyleGAN1
Release face model for downstream task : InsetGAN
Add Inversion Script into the provided editing pipeline

amadeuzou/StyleGAN-Human