amazon-science/prompt-pretraining

Which version of ImageNet21K should I download

Closed this issue · 8 comments

image
Which version of ImageNet21K should I download?The green one or the red one?Training prompt and Detic use the red one?The red one is about 280G,it that normal?

Hi,
(1) yes, you should download the red one.
(2) yes, downloading ImageNet-21k requires huge disk space.

I downloaded the red one, but MD5 didn't match, is that normal?
image
image

I don't think it is normal. It seems that you did not download the complete file.
However, if you just want to reproduce the prompt pre-training experiment, I think it's ok to use the imperfect data, because our pre-training only samples up to 16 images for each class.

I unzipped the red one, the directory structure like this, is that normal?
image
image

I unzipped the red one, i got imagenet21k_small_classes directory, it only has 5592 folders, less than 8718
Is the two pick rectangles irrelevant to ImageNet21K?
Where are the two folders(imagenet21_train and imagenet21_val) in yellow rectangles come from? From the green one?
image
image

Hi, I think you didn't completely download the datasets from the red one. As you said, imagenet21k_resized.tar.gz has 280.51 GB, if you download it completely, it should include 3 folders after decompression, i.e., imagenet21k_train, imagenet21k_val, imagenet_21k_small_calsses.

On my end, after decompression, the sizes are:

  • imagenet21k_small_calsses: 33G
  • imagenet21k_val: 12G
  • imagenet21k_small_calsses: 250G

Please check the size of your imagenet_21k_small_calsses folder. If it's significantly less than 280G, that indicates the full archive was not downloaded.

Hi, I think you didn't completely download the datasets from the red one. As you said, imagenet21k_resized.tar.gz has 280.51 GB, if you download it completely, it should include 3 folders after decompression, i.e., imagenet21k_train, imagenet21k_val, imagenet_21k_small_calsses.

On my end, after decompression, the sizes are:

  • imagenet21k_small_calsses: 33G
  • imagenet21k_val: 12G
  • imagenet21k_small_calsses: 250G

Please check the size of your imagenet_21k_small_calsses folder. If it's significantly less than 280G, that indicates the full archive was not downloaded.

Thank you for your reply. According to your reply, I have located a problem with the download process. After downloading it three times, I finally got the same decompression result as yours, and the md5 file is consistent with the official website