mertyg/vision-language-models-are-bows

Concrete benchmark results of attributes understanding

Yangyi-Chen opened this issue · 5 comments

Hi, thanks for your great work!

Could you also provide concrete results of attributes understanding? Currently, the paper only shows figures and some concrete benchmark results of relation understanding in the Appendix.

Great thanks!

vinid commented

Hello!!! thank you!!

could you elaborate a bit more on what you'd like to see?

vinid commented

closing this for now now but feel free to reopen it!

Hi! Sorry for the late reply.

I would like to see fine-grained results in Visual Genome Attribute dataset, just like Table 2 shows results for the visual genome relation dataset.

If there is too much tedious work to collect the results, could you just provide each model's performance on the attribute dataset?

vinid commented

Hello!

The problem with that table is that it's very long and kind of difficult to read. I'll search for the csv with the attribute results, but it might be easier and faster just to use the notebook in colab if you just want to collect them!

great thanks!