ChenRocks/UNITER

RefCOCO training / evaluation details

j-min opened this issue · 2 comments

j-min commented

Hello,
I have some questions regarding RefCOCO/+/g training / evaluation details.

  1. Are you going to upload RefCOCO/+/g training/evaluation codes?
  2. Which boxes did you finetune UNITER on?
  3. Which boxes did you use to evaluate on val, test, val^d, and test^d evaluation respectively? Did you use Mask R-CNN boxes from MattNet?

Table from UNITER
image

It seems ViLBERT-MT authors finetuned their model on 100 BUTD boxes + Mask R-CNN boxes from MattNet-> code.
Then they used 100 BUTD boxes during evaluation -> code

I calculated oracle scores on RefCOCOg val split: "if there exists a candidate box with iou(candidate,target) > 0.5 => correct"

Mask R-CNN boxes from MAttNet -> 86.10%
MS COCO GT boxes -> 99.6%
VilBERT-MT's 100 BUTD boxes on RefCOCOg -> 96.53%

Since BUTD boxes have better coverage on Mask R-CNN boxes from MAttNet, I don't think this is fair comparison to MattNet. Also this is not consistent with the ViLBERT-MT paper.

Paragraph from ViLBERT-MT
image

ViLBERT-MT authors compared ViLBERT-MT and UNITER on test^d. I wonder which boxes you used for UNITER finetuning and evaluation.

Table from ViLBERT-MT
image

We finetuned on ground-truth (COCO's) annotated boxes whose features are extracted using butd, and ran inference on

  1. ground-truth boxes
  2. mattnet's detected boxes
j-min commented

Thank you for the clarification!