Abstract: Unconditional human image generation is an important task in vision and graphics, which enables various applications in creative industry. Existing studies in this field mainly focus on “network engineering” such as designing new components and objective functions. In this work, we take a data-centric perspective and investigate multiple critical aspects in “data engineering”, which we believe would complement the current practice. To facilitate a comprehensive study, we collect and annotate a large-scale human image dataset with diverse poses and textures. Equipped with this large dataset, we rigorously investigate three essential factors in data preparation for human generation, namely data size, data distribution, and data alignment. Extensive experiments reveal several valuable observations w.r.t. these aspects: 1) A large-scale data, more than 40K images, are needed to train a high-fidelity unconditional human generation model with vanilla StyleGAN. 2) A balanced training set helps improve the quality of human generation with rare faces and clothing textures compared to the long-tailed counterpart. 3) Human GAN models that use body centers for alignment outperform models trained using face or torso centers as alignment anchors. In addition, a model zoo and editing applications are demonstrated to facilitate future research in community.
Keyword: Human Image Generation, Data-Centric, StyleGAN
[Demo Video] | [Project Page] | [Paper]
Structure | 1024x512 | 512x256 |
---|---|---|
StyleGAN1 | stylegan_human_v1_1024.pkl | to be released |
StyleGAN2 | stylegan_human_v2_1024.pkl | stylegan_human_v2_512.pkl |
StyleGAN3 | to be released | stylegan_human_v3_512.pkl |
To help users get started, we prepare a Jupyter notebook here. This allows users to inference images with the provided models, as well as visualize the performance of style-mixing, interpolation, and attributes editing.
The notebook will guide user to install the necessary environment and download pretrained models. The output images can be found in ./StyleGAN-Human/outputs/
.
- The original code based are derived from stylegan (tensorflow), stylegan2-ada (pytorch), stylegan3 (pytorch), released by NVidia
- We tested in Python 3.8.5 and PyTorch 1.9.1 with CUDA 11.1 as well as Pytorch 1.7.1 with CUDA 10.1. (See https://pytorch.org for PyTorch install instructions.)
To work with this project on your own machine, we also provide the integrated environment installation instructions:
conda env create -f environment.yml
conda activate stylehuman
# [Optional: tensorflow 1.x is required for StyleGAN1. ]
pip install nvidia-pyindex
pip install nvidia-tensorflow[horovod]
pip install nvidia-tensorboard==1.15
Extra notes:
- In case having some conflicts when calling CUDA version, please try to empty the LD_LIBRARY_PATH. For example:
LD_LIBRARY_PATH=; python generate.py --outdir=out/stylegan_human_v2_1024 --trunc=1 --seeds=1,3,5,7
--network=pretrained_models/stylegan_human_v2_1024.pkl --version 2
Please put the downloaded pretrained models from above link under the folder 'pretrained_models'.
# Generate human full-body images without truncation
python generate.py --outdir=outputs/generate/stylegan_human_v2_1024 --trunc=1 --seeds=1,3,5,7 --network=pretrained_models/stylegan_human_v2_1024.pkl --version 2
# Generate human full-body images with truncation
python generate.py --outdir=outputs/generate/stylegan_human_v2_1024 --trunc=0.8 --seeds=0-10 --network=pretrained_models/stylegan_human_v2_1024.pkl --version 2
# Generate human full-body images using stylegan V1
python generate.py --outdir=outputs/generate/stylegan_human_v1_1024 --network=pretrained_models/stylegan_human_v1_1024.pkl --version 1 --seeds=1,3,5
python interpolation.py --network=pretrained_models/stylegan_human_v2_1024.pkl --seeds=85,100 --outdir=outputs/inter_gifs
python style_mixing.py --network=pretrained_models/stylegan_human_v2_1024.pkl --rows=85,100,75,458,1500 \\
--cols=55,821,1789,293 --styles=0-3 --outdir=outputs/stylemixing
python stylemixing_video.py --network=pretrained_models/stylegan_human_v2_1024.pkl --row-seed=3859 \\
--col-seeds=3098,31759,3791 --col-styles=8-12 --trunc=0.8 --outdir=outputs/stylemixing_video
python edit.py --network pretrained_models/stylegan_human_v2_1024.pkl --attr_name upper_length \\
--seeds 61531,61570,61571,61610 --outdir outputs/edit_results
Note:
- ''upper_length'' and ''bottom_length'' of ''attr_name'' are available for demo.
- Layers to control and editing strength are set in edit/edit_config.py.
Demo for InsetGAN
We implement a quick demo using the key idea from InsetGAN: combining the face generated by FFHQ with the human-body generated by our pretrained model, optimizing both face and body latent codes to get a coherent full-body image. Before running the script, you need to download the FFHQ face model, or you can use your own face model, as well as pretrained face landmark and pretrained CNN face detection model for dlib
python insetgan.py --body_network=pretrained_models/stylegan_human_v2_1024.pkl --face_network=pretrained_models/ffhq.pkl \\
--body_seed=82 --face_seed=43 --trunc=0.6 --outdir=outputs/insetgan/ --video 1
Edit upper length (StyleSpace) | Edit bottom length (StyleSpace) | Edit upper length (InterFaceGAN) | Edit bottom length (InterFaceGAN) |
---|---|---|---|
![]() |
![]() |
![]() |
![]() |
For more demo, please visit our web page .
- Release Dataset
- Release 1024x512 version of StyleGAN-Human based on StyleGAN3
- Release 512x256 version of StyleGAN-Human based on StyleGAN1
- Release face model for downstream task : InsetGAN
- Add Inversion Script into the provided editing pipeline