Clarification on Referring Segmentation
yxchng opened this issue · 1 comments
yxchng commented
Based on the code:
X-Decoder/xdecoder/architectures/xdecoder_model.py
Lines 419 to 424 in 584123d
Is it true that referring segmentation in X-Decoder is done by segmentation -> classification (matching mask with highest similarity)?
MaureenZOU commented
Yes, but we are not directly matching the mask embedding, we compute hungarian matching between pred mask and gt mask, pred text embedding and gt text embedding. During evaluation we use text embedding matching score only.