AUTOMATIC1111/stable-diffusion-webui

Img2Img alt should divide by sigma[-1], not std

briansemrau opened this issue · 6 comments

While the noise used in stable diffusion is generated from a normal distribution, that doesn't mean that it is always perfectly normal. Normalizing the reverse-generated noise results in incorrect saturation in the output image.

I've been able to solve the saturation issue by dividing by the first sigma. I spent some time verifying the img2imgalt technique by attempting to re-derive it. Dividing by the first sigma appears to be the correct approach.

See my contribution to another repo here:
code: https://github.com/sd-webui/stable-diffusion-webui/blob/17748cbc9c34df44d0381c42e4f0fe1903089438/scripts/sd_utils.py#L525
original pr: https://github.com/sd-webui/stable-diffusion-webui/pull/1070/files#diff-2e278c1b9a8c0e308b8272729de19c973ac24710b3467dfb9c877db5d1cf7a3f

Edit:
The changes by MartinCairnsSQL below are also required. They're from the original author's gist.
#736 (comment)

Dumb/random question, but why does it only work with Euler, and would it be possible to make it work with Euler_A/something else?

It only works with euler because the code is just the euler sampler in reverse. To support other samplers, they have to be similarly derived. It shouldn't be too hard for some of them, but it's still not super easy.

This just gives me blurry images for settings that used to work. What settings reproduce original images by the author with your code?

It's gives me blurry images too

I had a look at the diffs in the method and found that the variables sigma_in, t & d also have changes. I made the changes to a copy of the script and tested a few images and found the original version had too much contrast for the same set of prompts other than that the sigma change didn't blur the image with the extra changes.

img2imgalt_sigma.zip

sigma_in = torch.cat([sigmas[i] * s_in] * 2)

        sigma_in = torch.cat([sigmas[i - 1] * s_in] * 2)

t = dnw.sigma_to_t(sigma_in)

        if i == 1:
            t = dnw.sigma_to_t(torch.cat([sigmas[i] * s_in] * 2))
        else:
            t = dnw.sigma_to_t(sigma_in)

d = (x - denoised) / sigmas[i]

        if i == 1:
            d = (x - denoised) / (2 * sigmas[i])
        else:
            d = (x - denoised) / sigmas[i - 1]

These changes look good.

Would it be possible to combine img2imgalt with masked inpaint in some way?
Separating the original prompt from the img2imgalt prompt we are essentially trying to tell the program which parts are the ones we want changed. But sometimes it is hard for the program to key in on those areas. Trying to change the hair color it would only do parts of the hair, and I would have to throw the settings too far out of whack, producing jibbirish in the process, in order to encapsulate the full hair. If I could do masked assist, or even automatically using a tool as txt2mask, would that not yield a better outcome?