distill-ccld: A Jupyter Notebook repository from giganttheo

Distill CLOOB-Conditioned Latent Diffusion trained on WikiArt

As part of the HugGAN community event, I trained a 105M-parameters latent diffusion model using a knowledge distillation process.

Prompt : "A snowy landscape, oil on canvas"

Links

Model card for the teacher model on HuggingFace, trained by Jonathan Whitaker. He described the model and training procedure on his blog post
Model card for the student model on HuggingFace, trained by me. You can check my WandB report. This version has 105M parameters, against 1.2B parameters for the teacher version. It is lighter, and allows for faster inference, while maintaining some of the original model capability at generating paintings from prompts.
Gradio demo app on HuggingFace's Spaces to try out the model with an online demo app
iPython Notebook to use the model in Python
WikiArt dataset on datasets hub
GitHub repository

How to use

You need some dependencies from multiple repositories linked in this repository : CLOOB latent diffusion :

CLIP
CLOOB : the model to encode images and texts in an unified latent space, used for conditioning the latent diffusion.
Latent Diffusion : latent diffusion model definition
Taming transformers : a pretrained convolutional VQGAN is used as an autoencoder to go from image space to the latent space in which the diffusion is done.
v-diffusion : contains some functions for sampling using a diffusion model with text and/or image prompts.

An example code to use the model to sample images from a text prompt can be seen in a Colab Notebook, or directly in the app source code for the Gradio demo on this Space

Demo images

Prompt : "A martian landscape painting, oil on canvas"

giganttheo/distill-ccld

Distill CLOOB-Conditioned Latent Diffusion trained on WikiArt

Links

How to use

Demo images