/StructuredNoiseInjection

TensorFlow implementation of the CVPR2020 paper: Disentangled Image Generation Through Structured Noise Injection

Primary LanguagePython

Structured Noise Injection

Paper: https://arxiv.org/abs/2004.12411

Video: https://youtu.be/7h-7wso9E0k

A TensorFlow implementation of structured noise injection as described in the paper. We adapt the original StyleGAN architecture code from https://github.com/NVlabs/stylegan.

The code allows:

  • Disentangled editing of generated images (local features, mid-scale features, pose, and overall style)
  • Training a model with structured noise injection on any dataset
  • Modifying the paper's choices of grid dimensions, local code length, shared code length, and global code length

Examining a pretrained network

We follow the same approach as the original StyleGAN code.

First, download the pretrained network from: https://drive.google.com/file/d/1jxzRnLX2OhPos4E1pqz-7ed4mqVyLwoQ/view?usp=sharing and place it in the same folder as pretrained_SNI.py

In order to randomly generate a few images, and preview the changes possible by our method:

python3 pretrained_SNI.py

This will generate two unique faces, and multiple figure showing specific modifications while maintaining the face identity. Any cell of the noise grid can be changed individually by providing an 8x8 binary mask to the function randomize_specific_local_codes as demonstrated in the example file.

Changing the globally-shared code entry (affects pose) GlobalCodeExamples

Changing the codes that are shared by region (affects mid-level features such as age and accessories) SharedCodeExamples

Changing all local codes (affects the fine details of the face) localCodeExamples

Changing specific local codes (4x4 cells around the mouth) mouthCodeExamples

Changing specific local codes (3x7 cells covering the top of the head) hairCodeExamples

Training a network from scratch

To run training on the FFHQ datasets with the default settings: python3 train.py

The network can be trained similarly to training the original StyleGAN but with a different generator. The code for our generator is included under training/networks_structurednoiseinjection.py.

Please refer to https://github.com/NVlabs/stylegan for the datasets and code requirements.

Frequently Asked Questions

Do you use specific losses?

No. We use the existing GAN loss from StyleGAN.

How do you enforce disentanglement?

We use independent codes and independent mapping parameters per location (4x4 or 8x8). This enforces that each (x,y) location at the input tensor is influenced by only a single local code (plus the shared and global code). Other than that, we believe the achieved disentanglement is due to our selection of two codes: one that is more suitable for encoding spatial details, and one that is more suitable for encoding stylisitic information.

How does your method change attributes such as smile and eyeglasses without labels?

Our method is not supervised, and the network is unaware during training of semantic labels such as smile and eyeglasses. However, disentangled editing is possible because we focus on locations instead of attributes. Due to our structure, the only noise entries that will affect the mouth are the noise cells aligned around the mouth. After training, the user can resample only the noise codes around the mouth to easily change the shape of the mouth to find codes that will add\remove smile. By focusing on 'locking down' details of the face at different places of the noise codes, we enable disentangled editing without labels.

Testing new settings of structured noise injection

Changing cell resolution / changing when to beging style modulation

This can be done in the synthesis part. In order to change where to begin style modulation the layer_epilogue function can be edited. Please note that each resolution contains two style modulation layers. It is difficult to change cell resolution above 8x8 currently since it loses the benefits of progressive growing and lower resolution information. By default, the 8x8 resolution is used as in the paper.

Changing global/shared/local code length

This can be done in the mapping function. The user can feed random codes that are arranged in a certain way, but the user must specify how to assemble the final code for each grid cell given the random codes. If the code lengths or arrangement are changed, the my_randoms in training/misc.py should be updated to chech the changes during training.