The loss of referring segmentation
jshilong opened this issue · 3 comments
Thanks for the great work,
In section 4.1, you mentioned that the model was pre-trained on "panoramic segmentation, image-text"
pairs (itp), and referring segmentation.I can't find the details of how you use
Referring Segmentationdata in 3.4, would you mind providing more details about
Referring Segmentation` data loss in the pre-training phase? or did I miss it?
Thanks
It seems it is essentially a binary classification problem
Thanks for your interest in our work, and for bringing up the problem that we do not give details for referring segmentation.
- Data preparation: We use all the seg-text pairs from refcoco(g/+) dataset and exclude the validation set. In addition, those images that do not have referring seg ground truth, we use instance segmentation as labels (e.g. person -> all person instance).
- Loss Function: For each image with ground truth, we do Hungarian matching between prediction and ground truth. Only text to image loss is applied on referring segmentation. For each text, we train the highest score mask prosal to the ground truth.
@MaureenZOU
1.Just to clarify. Does "refcoco(g/+)" mean only refcoco+ and refcocog, with refcoco excluded?
2. What does this mean "In addition, those images that do not have referring seg ground truth, we use instance segmentation as labels (e.g. person -> all person instance)"?