how are the object tags are generated?

Question

how are the object tags are generated?

runzeer opened this issue 3 years ago · 4 comments

In your VQA val2014_qla_mrcnn.json file , I found the number of the object tags can not correspond to the numbers of the features in the feats pt file. So could you tell me how to generate the object tags?

Answer 1 · 2021-04-16T21:49:01.000Z

It is generated by an object detection model trained on COCO. In fact, you can use tags generated by the VinVL models, without any accuracy drop. The trick is that you need to only keep the tags that in the COCO 80-classes vocabulary.

Answer 2 · 2021-10-14T03:01:58.000Z

It is generated by an object detection model trained on COCO. In fact, you can use tags generated by the VinVL models, without any accuracy drop. The trick is that you need to only keep the tags that in the COCO 80-classes vocabulary.

Hi Zhang,
Does the "o" represent the object tags? Does the order matter if we replace the object tags with VinVL's classes?
How could we generate the "an" and "s"?

Also, if we use VinVL image features for the Oscar model for the VQA task, why does the code still ask for the mrcnn.json file?
Thank you a lot in advance!

Answer 3 · 2021-10-14T20:19:08.000Z

Wow, I think "an" and "s" are not even used! That means all we need is the "q" and "o" (object tags)!!! That makes things much easier!

Answer 4 · 2021-10-15T05:56:24.000Z

Yes, @CCYChongyanChen "an" and "s" are answers and scores, not used in the model but used in the evaluation.

The order does not matter, but it matters to only keep the tags that in the COCO 80-classes vocabulary.