In this project, I use a fine-tuning technique that lets me train text-to-image diffusion models on a concept like a character or style, called Dreambooth. Dreambooth allows the model to generate contextualized images of the subject in different scenes, poses, and views. You can find the implementation of Dreambooth here.
Dreambooth is a method to personalize text-to-image models given just a few (4-6) images of a subject. However a Stable Diffusion community has found that using 10 to 12 images leads to better results. Consequently, I fine-tuned a model called 'runwayml/stable-diffusion-v1-5' with two sets of images, each containing 12 images, featuring two of my friends identified as 'bnh' and 'tuki.' The respective priors for preservation classes were denoted as 'Keanu Reeves' and 'Justin Bieber' (human class).
bnh keanu reeves... | ||||
tuki justin bieber... |
Subsequently, I apply various Prompt-to-Prompt text-based editing operations. Prompt-to-prompt provides users with simple and intuitive means to edit images, leveraging textual semantic power while preserving the original composition and structure.
Localized editing involves modifying the user-provided prompt, enabling us to preserve the spatial layout, geometry, and semantics.
burger | cake | ||
A painting of a bnh keanu reeves eating a... | |||
burger | lasagne | ||
A painting of a tuki justin bieber eating a... | |||
Original | W.o prompt-to-prompt | prompt-to-prompt |
Global editing affects all parts of the image, but still retain the original composition.
(Original)photo of ... | charocal painting of ... | impressionism painting of ... | neo classical painting of ... | watercolor painting of ... |
...a tuki justin bieber wearing a sunglasses in a forest |
By re-scaling the attention of the specified word, we can control the extent to which it influences the generated image.
bnh keanu reeves wearing a pair of sunglasses under a blossom(↓) tree |
tuki justin bieber wearing a pair of sunglasses under a blossom(↑) tree |