federicoVisi/StyleCLIP-Tutorial

Jupyter Notebook

StyleCLIP Tutorial

This repository provides an unofficial tutorial for StyleCLIP.

The original paper and source codes can be found in this link.
Face manipulation example
- Text prompt: "A really sad face"

Face manipulation animation

What is CLIP?

CLIP jointly trains an image encoder and a text encoder using a large dataset.
The cosine similarity between an image and text feature is high if they have similar semantic meanings.

StyleCLIP Methods

The StyleCLIP provides three methods based on various previous studies.
Tutorial: Lecture note
Tutorial: Video: Paper explained

1. Latent Optimization

Google Colab tutorial source code
This ia a simple approach for leveraging CLIP to guide image manipulation.

The optimization method requires 200 - 300 iterations that spend several minutes.

2. Latent Mapper

After trained per text prompt (10 hours), the mapper manipulates attributes in one forward pass.

3. Global Directions

Find global directions in a StyleGAN's style space S.
After finding a global direction, we can apply this global direction to any latent vector s.