About Dataset

Question

About Dataset

Closed this issue 2 years ago · 9 comments

Hi, thank you very much for open sourcing. I am very interested in your thesis. ImageNet10, ImageNet20, ImageNet100 in your paper are subsets of ImageNet (ILSVRC-2012). I would like to ask, is it the training set, the test set or the val set of ImageNet (ILSVRC-2012)?

Answer 1 · 2023-05-28T15:39:59.000Z

Hi, I saw in your other issues, "The training split is used for other baseline or OOD methods that require fine-tuning, such as Mahalanobis scores using the training set to estimate class means and covariance matrices.".
If I want to fine-tune, should the ImageNet training data set that needs to be downloaded be the 138GB (Task 1 & 2) data set on the ImageNet official website, or the 728MB (Task 3) data set?

Answer 2 · 2023-05-29T09:23:01.000Z

Hi! Thanks for your interest in our work. If you want to fine-tune, just use the normal training set (e.g., the 138GB one for ImageNet-1k/ILSVRC2012).

Answer 3 · 2023-05-29T12:37:02.000Z

Thanks a lot for your answer. In the experiment Table.1 of your paper, does the ImageNet100 used belong to the training set, val set or test set in ImageNet1k? Can this experiment’s ImageNet100 be obtained by create_imagenet_subset.py file?

Answer 4 · 2023-05-29T14:14:41.000Z

I downloaded the ImageNet1k dataset, and found that the val dataset and the Test dataset inside are full of pictures, and there are no folders classified by category. It seems that there is no way to use the file create_imagenet_subset.py to extract the subset ImageNet100 from ImageNet1k. Is the ImageNet-100 used in Table 1 in your paper extracted from the ImageNet1k training set?
Can you provide the ImageNet-100, 20, 30 datasets used in Table 1 in the paper? Sincerely thank you for your reply.

Answer 5 · 2023-05-30T07:29:35.000Z

Yes, all ImageNet subsets can be created using create_imagenet_subset.py. Class IDs are provided under data/[ID_dataset]/class_list.txt. The create_imagenet_subset.py script will create a train and a val split for each ImageNet subset from the original ImageNet-1k.

Table 1 is the test result of the MCM score, where each ID dataset means the val (test) split of the ImageNet and its subsets. No training is involved.

When you download and extract the ImageNet-1k, the folder structure should be like (so that it can be loaded with torchvision.datasets.ImageFolder):

ROOT_DIR
|-- train
      |-- n01440764
      |-- n01755581
      |-- ...
|-- val
      |-- n01440764
      |-- n01755581
      |-- ...

Answer 6 · 2023-05-30T08:09:18.000Z

Hi, thank you very much for your reply. In other words, you only use the val subset of ImageNet1K in Table.1 of the paper?
I downloaded the val dataset of ILSVRC2012 from ImageNet's official website. After downloading and opening, I found that there are no folders classified by category. I downloaded it from this link.
https://image-net.org/
I'm not sure if I downloaded the wrong file. Can you provide me the link or file to download the ImageNet dataset from your paper? Thanks again for your reply.

Answer 7 · 2023-05-30T08:15:40.000Z

In other words, you only use the val subset of ImageNet1K in Table.1 of the paper?

Yes, the ID val subset is used in Table 1.

Thanks for the feedback. Normally, the downloaded ImageNet-1k from the official website should have the structure above. I will double-check if things has changed.

Answer 8 · 2023-05-31T09:37:03.000Z

I just checked and it seems that the dataset downloaded from the official website is intact. Have you used the standard preprocessing steps: https://github.com/facebookarchive/fb.resnet.torch/blob/master/INSTALL.md#download-the-imagenet-dataset

Answer 9 · 2023-05-31T09:46:09.000Z

Thanks for your reply, I am not using preprocessing step. Sorry, I mistakenly thought that the downloaded data set is divided into subfolders by category. Thanks again for your reply and help.