cientgu/InstructDiffusion

Can this work output a full-image segmentation mask (non-transparent), rather than a semi-transparent mask specific to an object?

Closed this issue · 6 comments

This is an excellent research work and I would like to thank all the authors for their efforts !

I have a little question about this work:

Can this work output a full-image segmentation mask (non-transparent), rather than a semi-transparent mask specific to an object?

Thank you for your interest in our work. I have two suggestions: 1. You can set the transparency parameter (

) of the segmentation dataset to 0 during training, so the trained model will predict opaque masks. 2. You can also train a post-processing network based on our model, with the input being a transparency mask image generated by our model and the output being an opaque mask you expect.

Thank you for your interest in our work. I have two suggestions: 1. You can set the transparency parameter (

) of the segmentation dataset to 0 during training, so the trained model will predict opaque masks. 2. You can also train a post-processing network based on our model, with the input being a transparency mask image generated by our model and the output being an opaque mask you expect.

Thanks for the quick response. My doubts were answered. It would be better if the parameters for adjusting transparency can be put into WebDemo. Excellent work.

I need to remind you that the current model does not support directly setting transparency. You would need to modify the transparency parameter in the dataset and retrain a new model to support it.

Thank you for your interest in our work. I have two suggestions: 1. You can set the transparency parameter (

) of the segmentation dataset to 0 during training, so the trained model will predict opaque masks. 2. You can also train a post-processing network based on our model, with the input being a transparency mask image generated by our model and the output being an opaque mask you expect.

Thank you for sharing this excellent work. I have a silly question, why can't we simply take the pix-wise difference between the result image and the input one to obtain things like segmentation masks or keypoint locations, instead of training a lightweight U-Net for post-processing?

Thank you for your interest in our work. I have two suggestions: 1. You can set the transparency parameter (

) of the segmentation dataset to 0 during training, so the trained model will predict opaque masks. 2. You can also train a post-processing network based on our model, with the input being a transparency mask image generated by our model and the output being an opaque mask you expect.

Thank you for sharing this excellent work. I have a silly question, why can't we simply take the pix-wise difference between the result image and the input one to obtain things like segmentation masks or keypoint locations, instead of training a lightweight U-Net for post-processing?

This approach is acceptable, but it has some drawbacks: 1. hand-crafted 2. sensitive to hyperparameters, requiring a lot of threshold adjustments 3. not very accurate. The main challenge is that due to the influence of VQVAE and the diffusion process, the original image and the restored image are different in areas outside the mask.

I see, thanks for the quick reply.