/Fine-tuning-Stable-Diffusion-and-Prompt-to-Prompt

Employing Dreambooth to personalize text-to-image models, followed by the application of various text-based editing operations using Prompt-to-Prompt techniques.

Fine-tuning Stable Diffusion and Prompt-to-Prompt Editing Techniques

In this project, I use a fine-tuning technique that lets me train text-to-image diffusion models on a concept like a character or style, called Dreambooth. Dreambooth allows the model to generate contextualized images of the subject in different scenes, poses, and views. You can find the implementation of Dreambooth here.

Dreambooth is a method to personalize text-to-image models given just a few (4-6) images of a subject. However a Stable Diffusion community has found that using 10 to 12 images leads to better results. Consequently, I fine-tuned a model called 'runwayml/stable-diffusion-v1-5' with two sets of images, each containing 12 images, featuring two of my friends identified as 'bnh' and 'tuki.' The respective priors for preservation classes were denoted as 'Keanu Reeves' and 'Justin Bieber' (human class).

bnh keanu reeves...
tuki justin bieber...

Subsequently, I apply various Prompt-to-Prompt text-based editing operations. Prompt-to-prompt provides users with simple and intuitive means to edit images, leveraging textual semantic power while preserving the original composition and structure.

photo of ... wearing a pair of sunglasses on a beach van gogh painting of ... wearing a pair of sunglasses on a beach
...bnh keanu reeves...
photo of ... wearing a pair of sunglasses and taking a selfie in front of a mirror van gogh painting of ... wearing a pair of sunglasses and taking a selfie in front of a mirror
tuki justin bieber...

Text-Only Localized Editing

Localized editing involves modifying the user-provided prompt, enabling us to preserve the spatial layout, geometry, and semantics.

burger cake
A painting of a bnh keanu reeves eating a...
burger lasagne
A painting of a tuki justin bieber eating a...
Original W.o prompt-to-prompt prompt-to-prompt

Global Editing

Global editing affects all parts of the image, but still retain the original composition.

(Original)drawing of ... on a snowy mountain van gogh painting of ... on a snowy mountain van gogh painting of ... in the jungle van gogh painting of ... in a river van gogh painting of ... in the desert
...a bnh keanu reeves...
(Original)photo of ... charocal painting of ... impressionism painting of ... neo classical painting of ... watercolor painting of ...
...a tuki justin bieber wearing a sunglasses in a forest

Attention Re-weighting

By re-scaling the attention of the specified word, we can control the extent to which it influences the generated image.

bnh keanu reeves wearing a pair of sunglasses under a blossom(↓) tree
tuki justin bieber wearing a pair of sunglasses under a blossom(↑) tree