Which version of ImageNet21K should I download
Closed this issue · 8 comments
Hi,
(1) yes, you should download the red one.
(2) yes, downloading ImageNet-21k requires huge disk space.
I don't think it is normal. It seems that you did not download the complete file.
However, if you just want to reproduce the prompt pre-training experiment, I think it's ok to use the imperfect data, because our pre-training only samples up to 16 images for each class.
Hi, I think you didn't completely download the datasets from the red one. As you said, imagenet21k_resized.tar.gz has 280.51 GB, if you download it completely, it should include 3 folders after decompression, i.e., imagenet21k_train
, imagenet21k_val
, imagenet_21k_small_calsses
.
On my end, after decompression, the sizes are:
- imagenet21k_small_calsses: 33G
- imagenet21k_val: 12G
- imagenet21k_small_calsses: 250G
Please check the size of your imagenet_21k_small_calsses folder. If it's significantly less than 280G, that indicates the full archive was not downloaded.
Hi, I think you didn't completely download the datasets from the red one. As you said, imagenet21k_resized.tar.gz has 280.51 GB, if you download it completely, it should include 3 folders after decompression, i.e.,
imagenet21k_train
,imagenet21k_val
,imagenet_21k_small_calsses
.On my end, after decompression, the sizes are:
- imagenet21k_small_calsses: 33G
- imagenet21k_val: 12G
- imagenet21k_small_calsses: 250G
Please check the size of your imagenet_21k_small_calsses folder. If it's significantly less than 280G, that indicates the full archive was not downloaded.
Thank you for your reply. According to your reply, I have located a problem with the download process. After downloading it three times, I finally got the same decompression result as yours, and the md5 file is consistent with the official website