Fashion Generation through Controllable StyleGAN

pca

GANs for All: Supporting Fun and Intuitive Exploration of GAN Latent Spaces

Author: Wei Jiang, Richard Lee Davis, Kevin Gonyop Kim, Pierre Dillenbourg

https://proceedings.mlr.press/v176/jiang22a.html

Abstract: We have developed a new tool that makes it possible for people with zero programming experience to intentionally and meaningfully explore the latent space of a GAN. We combine a number of methods from the literature into a single system that includes multiple functionalities: uploading and locating images in the latent space, image generation with text, visual style mixing, and intentional and intuitive latent space exploration. This tool was developed to provide a means for designers to explore the "design space" of their domains. Our goal was to create a system to support novices in gaining a more complete, expert understanding of their domain's design space by lowering the barrier of entry to using deep generative models in creative practice.

Dataset

We use Zalando dataset which can also be downloaded from Zalando images and Zalando Text Image Pairs. The dataset itself consists of 8732 high-resolution images, each depicting a dress from the available on the Zalando shop against a white-background.

Models

Train StyleGAN Model from scratch

!python train.py --outdir "training_runs" --snap 20 --metrics "none" --data "data/square_256_imgs.zip"

If the resume the model from a checkpoint, we can --resume

!python train.py --outdir "training_runs" --snap 20 --metrics "none" --data "data/square_256_imgs.zip" --resume "training_runs/00015-square_256_imgs-auto1-resumecustom/network-snapshot-000400.pkl"

Finetune DALL-E Model

!python "DALLE-pytorch/train_dalle.py" --vae_path "DALLE-pytorch/wandb/vae-final.pt" --image_text_folder "data/text_images"

Download the models in the following links and save them in your Google Drive.

Model Download
Pretrained fashion GAN fashion-gan-pretrained.pkl
Finetuned DALL-E model DALLE-finetuend.pkl

Explore Latent Space

We applied PCA analysis to identify the semantically meaningful directions in latent space. By exploring the first 10 principle components, we found sleeve, pattern, etc.

Open In Colab

To project the image into latent space, we employ SGD with perceptual loss + pixel-by-pixel MSE loss between two images. This loss noticeably improved our tool’s ability to embed out-of-sample examples in the latent space of the GAN.

$$ w^{*} = \min_{w} L(w) = \min_{w} \lVert f(G(w)) - f(I) \rVert_2^2 + \lambda_{pix} \lVert G(w) - I \rVert_2^2 $$

Text-to-image Generation

We implemented two methods to locate the design. The first method was to randomly sample images from the latent space, then to pass these along with the text description through a CLIP. model to find a small number of images which most closely matched the text. The second method was to fine-tune a DALL-E model on the Feidegger dataset, and then to pass the text descriptions to DALL-E and let it generate designs. We compare it with other models:

  • FahionGAN: realistic, diverse but low resolution.
  • DALLE: diverse, creative but less accurate.
  • Stable Diffusion: accurate, high resolution but not diverse (when given specific text with only changing background and models).

text-to-image

WebApp

We have built a website for user testing: generarive.fashion

https://generative.fashion

YouTube Demo

To run it in Google Colab: Open In Colab

The interface of our neural design space exploration tool. Users can upload images in the workplace on the left or generate random image through random button. Also, they can generate examples via text descriptions using the text box. Users can drag these examples to the style-mixing region or save them in the workplace. Users can selectively combine elements from three designs using the visual style-mixing panel. The output image is shown in the center of the canvas on the right. The 2D-dimensional canvas represents the design space for two attributes in the horizontal and vertical axes, and these attributes can be changed by using a drop-down menu for each axis. Dragging the image within the canvas is equivalent to moving through the latent space of the GAN in semantically meaningful directions.

interface_part1 interface_part2

Citation

@InProceedings{pmlr-v176-jiang22a,
  title =     {GANs for All: Supporting Fun and Intuitive Exploration of GAN Latent Spaces},
  author =    {Jiang, Wei and Davis, Richard Lee and Kim, Kevin Gonyop and Dillenbourg, Pierre},
  booktitle = {Proceedings of the NeurIPS 2021 Competitions and Demonstrations Track},
  pages =     {292--296},
  year =      {2022}
}

Acknowledgements

This project and application is a semester project at EPFL CHILI Lab