mikonvergence/ControlNetInpaint

Unexpected results when used Collab example with other images

Closed this issue · 7 comments

Hello,

I'm trying to use the provided Google Colab file to mask out a piece of cloth from the original image of a person wearing cloth and change the cloth with a textual prompt (like color for eg), but I'm encountering issues with the generated image. Specifically, the generated image appears to be of poor quality and has a mixed-up appearance.

Here are my inputs:
A person wearing a cloth.

image

A person wearing a grey cloth (representing no cloth).

image

Prompt

text_prompt="A woman wearing a green shirt"

It seems intuitive, however, the output image I'm receiving is not what I expected. I've followed the instructions provided in the repo, but I'm still unable to achieve satisfactory results.

OUTPUT
image

note:

  1. I tried converting the grey color of the mask image to black to see if it yields any better results, but it did not, unfortunately.
  2. I tried the canny with image and mask image to see any differences, but the generated image was still like this.

Could you please provide some guidance on how to improve the output image quality? If there are any known issues or limitations with the current implementation, please let me know as well.

Cheers
Seth

Hello, @sethupavan12! Can you provide the lines of code responsible for computing the mask, the canny edge, and calling the pipeline?

Yes.

Here is the collab link - https://colab.research.google.com/drive/1ypuZay9gBhRHqDkCc6ccHGt_d6P8Hdac?usp=sharing

Let me know if you would like to have the images. (or u can right-click-save them)

The mask image must be a binary mask, so the presence of a real image in it is causing unexpected behaviour.

Make sure your mask is a PIL image of shape (512,512) with only 2 values (0 and 255, where 255 indicates a pixel that should be inpainted).

I see. That makes sense. Thanks for the suggestion. I will give it shot and let you know

@mikonvergence So, I did use a binary mask this time around and the generated image has now improved.
Thanks so much for that.

This is the output I got for the text prompt "A woman wearing a red shirt""

image

Do you have any suggestions to make it follow my instructions more?

This is when I asked the same prompt but pink color shirt

image

I am confused as to why the color was only applied to the logo but not the entire shirt in the 1st picture

You can try to increase the influence of text using the guidance_scale parameter (default value is 7.5).

I would also try to check if the mask contains the entire grey shirt and that there is not small leakage near the mask border (it might be good to include some small margin to the mask). Basically, make sure there is nothing in the image that suggests that the shirt is grey.

It could also be a leakage from the latent of the original image (derived from the image argument). If the above doesn't work, try to cut out (replace with zeros) the masked area from the image before feeding it into the pipe.

Got it!

Thank you again!