microsoft/X-Decoder

strange results of instance segmentation

Ishmael-7 opened this issue · 3 comments

Hello, thank you for your great work! I encountered some strange errors when running the code with COCO images. I used the BestSegTiny model for open vocabulary instance segmentation, and most of the examples worked very well. However, when it came to categories like "man", "woman", "boy", "girl", "guy", strange results were generated. Especially, "man" is always recognized as "sky".
I'm not sure what caused this error, and I'm wondering if using a larger model would correct the results? I'm looking forward to the release of new ckpts.
pic1
pic2

jwyang commented

can you share the original image and text inputs for the above results?

1
2
3
4
Here are the original images from COCO. I entered only one word in the text input each time, such as 'man', 'woman', etc. Additionally, I attempted the All-in-One Demo, but encountered the same issue once more. I noticed that the words "man", "woman", "boy", "girl", and "guy" seem to cause issues more frequently than other words.
man_error
girl_error

Thanks so much for your interest in our work and your patience in crop those results! Your usage is correct and inspiring. I think the result is caused by the natural task gap and the training procedure gap between instance segmentation and referring segmentation. For example, girl/man these kinds of queries are closer to referring to segmentation instead of instance segmentation. Please see the following results:

Screenshot 2023-03-18 at 12 23 39 PM

Screenshot 2023-03-18 at 12 22 59 PM

Screenshot 2023-03-18 at 12 30 03 PM

Screenshot 2023-03-18 at 12 30 39 PM