pzzhang/VinVL

how are the object tags are generated?

runzeer opened this issue · 4 comments

In your VQA val2014_qla_mrcnn.json file , I found the number of the object tags can not correspond to the numbers of the features in the feats pt file. So could you tell me how to generate the object tags?

It is generated by an object detection model trained on COCO. In fact, you can use tags generated by the VinVL models, without any accuracy drop. The trick is that you need to only keep the tags that in the COCO 80-classes vocabulary.

It is generated by an object detection model trained on COCO. In fact, you can use tags generated by the VinVL models, without any accuracy drop. The trick is that you need to only keep the tags that in the COCO 80-classes vocabulary.

Hi Zhang,
Does the "o" represent the object tags? Does the order matter if we replace the object tags with VinVL's classes?
How could we generate the "an" and "s"?
image

Also, if we use VinVL image features for the Oscar model for the VQA task, why does the code still ask for the mrcnn.json file?
Thank you a lot in advance!

Wow, I think "an" and "s" are not even used! That means all we need is the "q" and "o" (object tags)!!! That makes things much easier!

Yes, @CCYChongyanChen "an" and "s" are answers and scores, not used in the model but used in the evaluation.

The order does not matter, but it matters to only keep the tags that in the COCO 80-classes vocabulary.