jackroos/VL-BERT

Could only download 400k images

gsrivas4 opened this issue · 5 comments

I am trying to use your script to download conceptual captions dataset. I was able to download only 400k images from the training set, instead of the 3M images in the training set. I have run your script 5 times to download the images which might be coming from unreliable servers. Apparently, there are a lot of images for which the links do not seem to work anymore. If you have the images downloaded somewhere, would it be possible for you to share the dataset?

I encounter the same problem with you. I download the VCR dataset and I encounter the error.
FileNotFoundError: [Errno 2] No such file or directory: './data/vcr/vcr1images/movieclips_The_Jackal/AA4zkmfbFD0@0.json'
Do you have the complete dataset?
Thank you.

@menggehe No I could not download the whole dataset. I am using only those 400k images from training. It would be good to get the complete dataset, though.

I am sorry that I couldn't find a way to share such a large dataset, for now. @gsrivas4

@menggehe No I could not download the whole dataset. I am using only those 400k images from training. It would be good to get the complete dataset, though.
I got it. Reply me, and I send it for you.

I am sorry that I couldn't find a way to share such a large dataset, for now. @gsrivas4

I am trying to use your script to download conceptual captions dataset. I was able to download only 400k images from the training set, instead of the 3M images in the training set. I have run your script 5 times to download the images which might be coming from unreliable servers. Apparently, there are a lot of images for which the links do not seem to work anymore. If you have the images downloaded somewhere, would it be possible for you to share the dataset?

why do you need 3M data? For what