Image Editing

Question

Image Editing

egeozsoy opened this issue 3 years ago · 13 comments

Do you have any suggestions for generating images by not only relying on text, but on a masked image, as open describes in their blog https://openai.com/dall-e-2/?

Answer 1 · 2022-05-12T13:23:22.000Z

You need to train it with inpainting task. In particular, the Decoder Unet needs to be able to take in a mask input to be concatenated with masked image on the channel dimension to predict the original image. Right now I think this feature is not implemented in this repo.

Answer 2 · 2022-05-12T13:24:37.000Z

I think this would be a nice addition at some point, if I do anything on this regard will let you know :)

Answer 3 · 2022-05-12T13:28:19.000Z

It would be good to train an all in one model where the model inpaints as needed or also do full image generation by simply giving a full zero mask.

Answer 4 · 2022-05-12T13:29:58.000Z

Agree, it would be a complementary task so doing both tasks at the same time should likely not hurt the overall performance.

Answer 5 · 2022-05-12T13:40:15.000Z

Correction to the above, the masked image and mask would also need to be concatenated to x (aka the noised image).

Answer 6 · 2022-05-12T13:50:55.000Z

So if one model is trained for both tasks at the same time, we would need to do noised_image + empty masked image + empty mask during normal training, noised_image + masked_original_image + mask during inpainting training.

Do we have to add noise to the entire image, or is it enough just to add noise to the masked part? Not exactly the most scientific resource but check the video on https://openai.com/dall-e-2/ timestamp 2:37 "monkey paying taxes". It seems like they input an image where only the masked part is noised.

Answer 7 · 2022-05-12T13:59:03.000Z

During training, only x is noised or denoised and masked image and mask used directly from the various pieces of openai GLIDE and DDPM code if I understand it correctly.

Answer 8 · 2022-05-12T14:07:12.000Z

Taken from the GLIDE paper: "Most previous work that uses diffusion models for inpaint- ing has not trained diffusion models explicitly for this task (Sohl-Dickstein et al., 2015; Song et al., 2020b; Meng et al., 2021). In particular, diffusion model inpainting can be performed by sampling from the diffusion model as usual, but replacing the known region of the image with a sample from q(xt|x0) after each sampling step."
So maybe a short-term solution could be to adapt the sampling logic to allow inpainting in this fashion?

Answer 9 · 2022-05-15T20:20:35.000Z

yea, depending on some circumstances next week, i could build this, let's leave this open

Answer 10 · 2022-05-15T20:20:50.000Z

It would be good to train an all in one model where the model inpaints as needed or also do full image generation by simply giving a full zero mask.

yup, this is the most ideal case :)

Answer 11 · 2022-05-19T16:13:56.000Z

It would be good to train an all in one model where the model inpaints as needed or also do full image generation by simply giving a full zero mask.

yup, this is the most ideal case :)

Alternatively can you not just finetune the generation model for inpainting you simply would have just to change the input layer the rest of the weights in the network you should be able to take over.

Answer 12 · 2022-06-24T15:29:44.000Z

i think i'm going to aim for integrating this technique https://github.com/andreas128/RePaint it is a pretty recent paper, but the results look good. can use this resampler technique for both dalle2 and imagen

Answer 13 · 2022-07-19T17:52:59.000Z

ok it is done https://github.com/lucidrains/dalle2-pytorch#inpainting