Image in latent space gets shifted during encoding.

Question

Image in latent space gets shifted during encoding.

treeform opened this issue 2 years ago · 3 comments

I am using a simple red image as input:

from stable_diffusion_pytorch import pipeline
from PIL import Image

prompts = ["a photograph of an astronaut riding a horse"]
input_images = [Image.open('red.png')]
images = pipeline.generate(prompts, input_images=input_images)
images[0].save('output.png')

But I am getting the input image shifted down 8px,8px and it generates ugly brown border:

I am pretty sure it happens during the Encode pass as its already shifter in latent space. Here is custom dumping of latent space to image:

Some thing in the Encode pass that is shifting it by a pixel in the latent space. And I can't figure out what.

Answer 1 · 2023-02-11T10:41:31.000Z

Wow, nice catch! It turned out that I implemented encoder's padding in different way with the original one. (at kjsman/stable-diffusion-pytorch, at CompVis/stable-diffusion)

The quick fix -- removing pad at downsampling Conv2d layer and implementing pad at forward method (because asymmetric padding is not supported by PyTorch) -- will be pushed soon.

The better fix -- adding nn.ZeroPad2d layer -- requires revision of the weight file. The problem is, I lost the weight conversion script (this might be also the reply for your prior issue). It was in my old laptop, I migrated to new laptop without that script, and I erased my old laptop. Now I think I should rewrite the script soon, but I can't promise ETA...

Answer 2 · 2023-02-11T17:59:06.000Z

Thank you, the fix works great!

Answer 3 · 2023-02-11T18:02:23.000Z

Also about the weights, I think I have figured that out: #7 (comment)