he-dhamo/simsg

Fewer val and test images for Visual Genome

tyler-hayes opened this issue · 1 comments

Hello!

The paper reports the final Visual Genome dataset statistics after filtering as follows (Appendix Section 6.3):

  • 62,565 train images
  • 5,506 val images
  • 5,088 test images

However, after running preprocess_vg.py to filter the Visual Genome dataset I obtain the following statistics:

  • 62,565 train images
  • 5,062 val images
  • 5,096 test images

I'm using Python 3.8.10 and downloaded the Visual Genome dataset version 1.4 from this link

Do you have any idea why the number of train images matches for me, but the number of val and test images do not?

Thank you in advance!

Hey, Sorry for the late reply! We reported the numbers from the sg2im paper https://arxiv.org/pdf/1804.01622.pdf, where the dataset preprocessing originated. However, having a look again in the generated dataset we reproduce the same results as you. My guess is that the preprocessing file might have been slightly altered since these numbers were computed.