RefCOCO training / evaluation details
j-min opened this issue · 2 comments
Hello,
I have some questions regarding RefCOCO/+/g training / evaluation details.
- Are you going to upload RefCOCO/+/g training/evaluation codes?
- Which boxes did you finetune UNITER on?
- Which boxes did you use to evaluate on val, test, val^d, and test^d evaluation respectively? Did you use Mask R-CNN boxes from MattNet?
It seems ViLBERT-MT authors finetuned their model on 100 BUTD boxes + Mask R-CNN boxes from MattNet-> code.
Then they used 100 BUTD boxes during evaluation -> code
I calculated oracle scores on RefCOCOg val split: "if there exists a candidate box with iou(candidate,target) > 0.5 => correct"
Mask R-CNN boxes from MAttNet -> 86.10%
MS COCO GT boxes -> 99.6%
VilBERT-MT's 100 BUTD boxes on RefCOCOg -> 96.53%
Since BUTD boxes have better coverage on Mask R-CNN boxes from MAttNet, I don't think this is fair comparison to MattNet. Also this is not consistent with the ViLBERT-MT paper.
ViLBERT-MT authors compared ViLBERT-MT and UNITER on test^d. I wonder which boxes you used for UNITER finetuning and evaluation.
We finetuned on ground-truth (COCO's) annotated boxes whose features are extracted using butd, and ran inference on
- ground-truth boxes
- mattnet's detected boxes
Thank you for the clarification!