kevin-ssy/CLIP_as_RNN

Set of referring image segmentation queries

Closed this issue · 1 comments

Thanks for your interesting work!!

I cannot get the construction details of the initial text queries for referring image segmentation.
From my understanding, open-vocab segmentation uses a set of input text queries and makes your recurrent filtering of non-existing concept texts necessary. However, since referring image segmentation uses a pair of an image and a text as input, I cannot understand how CaR eliminates the irrelevant texts recurrently. Therefore, my short knowledge can be filled by knowing the initial text queries for this task.

If the detail has existed on the paper, I would be sorry to ask about it, and excuse me, please.

Best regards,

Namyup Kim.

Hi Namyup,

Thank you for your kind interests! For referring segmentation we do not filter the irrelevant texts out so all results can be obtained in just one go. We have released all code at:

https://github.com/google-research/google-research/tree/master/clip_as_rnn

Please check it out!