Network Bending: Expressive Manipulation of Deep Generative Models
Terence Broad, Frederic Fol Leymarie, Mick Grierson
Presented at the 10th International Conference on Artificial Intelligence in Music, Sound, Art and Design (EvoMUSART 2021).
Paper: https://arxiv.org/abs/2005.12420
Video: https://youtu.be/IlSMQ2RRTh8
Abstract: We introduce a new framework for manipulating and interacting with deep generative models that we call network bending. We present a comprehensive set of deterministic transformations that can be inserted as distinct layers into the computational graph of a trained generative neural network and applied during inference. In addition, we present a novel algorithm for analysing the deep generative model and clustering features based on their spatial activation maps. This allows features to be grouped together based on spatial similarity in an unsupervised fashion. This results in the meaningful manipulation of sets of features that correspond to the generation of a broad array of semantically significant features of the generated images. We outline this framework, demonstrating our results on state-of-the-art deep generative models trained on several image datasets. We show how it allows for the direct manipulation of semantically meaningful aspects of the generative process as well as allowing for a broad range of expressive outcomes.We introduce a new framework for manipulating and interacting with deep generative models that we call network bending. We present a comprehensive set of deterministic transformations that can be inserted as distinct layers into the computational graph of a trained generative neural network and applied during inference. In addition, we present a novel algorithm for analysing the deep generative model and clustering features based on their spatial activation maps. This allows features to be grouped together based on spatial similarity in an unsupervised fashion. This results in the meaningful manipulation of sets of features that correspond to the generation of a broad array of semantically significant features of the generated images. We outline this framework, demonstrating our results on state-of-the-art deep generative models trained on several image datasets. We show how it allows for the direct manipulation of semantically meaningful aspects of the generative process as well as allowing for a broad range of expressive outcomes
Using our unsupervised clustering algorithm for finding sets of features based on the spatial similarity of their activation maps, sets of features emerge for semantic objects that can be manipulated in multiple different ways. Such as cluster 2
in layer 5
, controls a set of features that responsible for the generation of eyes:
Where you can see on the left is the original image, followed by the features being ablated, followed by them being scaled by a factor of 0.5, followed by them being dilated by a kernel radius 2.
In other layers sets of feautures are responsible for the generation of other kinds of properties of the image.
Such as the spatial formation of the face, the highlights on facial regions, the generation of textures or the contrast of colours in the image.
Transformations can also been chained together to produce distinctive and unusual results:
- Linux (tested on Ubuntu 18.04)
- PyTorch 1.5.0
- CUDA 10.1 or CUDA 10.2
- OpenCV - Refer to this dockerfile for installation of correct version
- Libtorch 1.5 (pre-C++11) download here
- PyYAML
We have built a number of torchscript operators using OpenCV and libtorch, you will have to have downloaded libtorch and installed the correct version of OpenCV for this to work. See requirements above or refer to the tutorial for writing your own torchscript operators for me details.
To build the custom operators you can use the bash script accompanying with the path the your downloaded and unzipped libtorch code
chmod +x ./build_custom_transforms.sh
./build_custom_transforms.sh /path/to/libtorch
If you are having issues with this you can link to the libtorch source in your Pytorch package installation folder: https://discuss.pytorch.org/t/segmentation-fault-when-loading-custom-operator/53301/8?u=tbroad
You can download the official StyleGAN2 FFHQ 1024 model converted to PyTorch format here: https://drive.google.com/drive/u/0/folders/1kxzAxJ9jrU6z9CPBJ8I87dXy-NJFG4zs
Or refer to the StyleGAN2 pytorch implementation that this code is based on for training your own models, or converting models from the tensorflow format into PyTorch compatible format.
You can either generate images from random latents:
python generate.py --ckpt /path/to/model.pt --size 1024 --pics 10 --config/example_transform_config.yaml
Or from a latent vector that you have projected into styleGAN space: numerical
If you are using layers with random parameters you can generate multiple different samples from the same latent:
python generate.py --ckpt /path/to/model.pt --size 1024 --pics 100 --latent /path/to/latent.pt --config config/example_transform_config.yaml
The API for defining transforms is a list of transform dictionaries that we provide as a yaml file which will look something like this:
transforms:
- layer: 4
transform: "translate"
params: [0.25, 0.0]
features: "all"
feature-param:
- layer: 5
transform: "ablate"
params: []
features: "cluster"
feature-param: 2
- layer: 12
transform: "binary-thresh"
params: [0.5]
features: "random"
feature-param: 0.5
We have a list that must be called transforms
then any number of transforms can be defined.
Each transform dict has 5 fields. The first is layer
that defines which layer you want to apply the transform to [between 1-16].
The second and third are features
and features-param
that define which set of convolutional features you want to apply the transform too. There are three modes for this features:"all"
will apply the transform to all the features in the layer, for this you can leave feature-param
blank. The second is features:"random"
which apply the transform to a random selection of features, for this feature-param
should be a float between 0-1, which defines the proportion of features that the transform will be applied too. The third is features:"cluster"
which applies transformation to groups of features based on the precalculated set of clusters that get loaded in the cluster dictionary. This should be an integer and can vary depending on the layer (see Table 1 in the paper for the number of clusters in each layer) or the cluster dictionary used (you can calculate you own clusters with an arbitrary amout for each layer).
The fourth and fifth are transform
and params
. For transform you pass it a string to define which transform layer you want to insert, and params is a list with either 0, 1 or 2 numerical parameters. A breakdown of all transforms and parameter types are listed below:
transform: "translate", params: [float (x), float (y)], range (-1 to 1)
transform: "scale", params: [float], range (0, inf)
transform: "rotate", params: [float], range (0,360)
transform: "erode", params: [int], range(1 -)
transform: "dilate", params: [int], range(1 -)
transform: "scalar-multiply", params: [float], range(-inf, inf)
transform: "binary-tresh", params: [float], range(-1,1)
transform: "flip-h", params: []
transform: "flip-v", params: []
transform: "invert", params: []
transform: "ablate", params: []
python projector.py --ckpt [CHECKPOINT] --size [GENERATOR_OUTPUT_SIZE] IMAGE1 ...
Examples of layer wide transformations being applied to every layer: https://drive.google.com/open?id=1hC9qSw57g2QZ3IggCWEa8BUoY7TcYFfR
Examples of various transformations applied to different clusters: https://drive.google.com/open?id=1oaNco1L1lu7gGgWKNMGqA_vFUFIOVmVx
Trained ShuffleNet CNN models used for clustering: https://drive.google.com/open?id=176GlteP_C3z-EvZDm69uJC8fziwrQOBw
Training and test set used to train CNN classifiers used for clustering: https://drive.google.com/open?id=1F3TOkR8Cu2EOgalXSc1kEpjHwxd_CCbo
This codebased is built upon on this excellent StyleGAN2 pytorch implementation by rosinality.
Model details and custom CUDA kernel codes are from official repostiories: https://github.com/NVlabs/stylegan2
Codes for Learned Perceptual Image Patch Similarity, LPIPS came from https://github.com/richzhang/PerceptualSimilarity
To match FID scores more closely to tensorflow official implementations, I have used FID Inception V3 implementations in https://github.com/mseitzer/pytorch-fid