Scribble-guided editing

Question

Scribble-guided editing

wileewang opened this issue 2 years ago · 2 comments

Hi! I wonder if a loss such as MSE or LPIPS is used between the user-provided scribbles and the scribbled regions of $\widehat{x}_0$ , in addition to the CLIP loss. I am curious how the shapes and colors stay consistent when only text with no specific description, e.g., "blanket" in Fig 9, is given.

Answer 1 · 2022-06-14T09:32:13.000Z

Hi,

Thank you for your interest in our work.
No - there is no need for MSE/LPIPS loss, the only signal for the scribbles comes from the partial nosing of the image (i.e. to noise the image to a certain noise level).
The shapes and the colors stay somewhat consistent because of the why the diffusion model operates - the initial stages generate a rough sketch of the image and the finer details are added later, so we can noise the image up to the point that preserves the colors/shapes. For more details please see Figure 32 in the paper.

Answer 2 · 2022-06-16T01:31:22.000Z

I see. Thanks for your reminding.