This repository provides an unofficial tutorial for StyleCLIP.
- The original paper and source codes can be found in this link.
- Face manipulation example
- Text prompt: "A really sad face"
- Face manipulation animation
- CLIP jointly trains an image encoder and a text encoder using a large dataset.
- The cosine similarity between an image and text feature is high if they have similar semantic meanings.
- The StyleCLIP provides three methods based on various previous studies.
- Tutorial: Lecture note
- Tutorial: Video: Paper explained
- Google Colab tutorial source code
- This ia a simple approach for leveraging CLIP to guide image manipulation.
- The optimization method requires 200 - 300 iterations that spend several minutes.
- After trained per text prompt (10 hours), the mapper manipulates attributes in one forward pass.
- Find global directions in a StyleGAN's style space S.
- After finding a global direction, we can apply this global direction to any latent vector s.