About ImageNet-100
shuaiNJU opened this issue · 5 comments
Hi,
can you provide the download link of ImageNet-100 datasets, or the code for how to randomly select imagenet-100 from imagenet-1k? Thanks a lot!
Sure! You can use the following script to create a subset ImageNet-K (e.g. K = 100) from ImageNet-1k. Just replace the src-dir with your path to ImageNet-1k.
import os
import shutil
from tqdm import tqdm
import argparse
import random
parser = argparse.ArgumentParser(description='Create ImageNet-100 subset',
formatter_class=argparse.ArgumentDefaultsHelpFormatter)
parser.add_argument('--K', default=100, type=int, help='num of classes to be subsampled')
parser.add_argument('--src-dir', default='/path/to/imagenet-1k', type=str,
help='path to ImageNet-1k')
# '/path/to/ImageNet-1k'
parser.add_argument('--dst-dir', default='datasets/imagenet-100', type=str,
help='root dir of in_dataset')
args = parser.parse_args()
os.makedirs(args.dst_dir, exist_ok=True)
#subsample K classes from ImageNet-1k
class_names = random.sample(os.listdir(os.path.join(args.src_dir, 'train')), args.K)
for split in ['train', 'val']:
for cls in tqdm(class_names):
shutil.copytree(os.path.join(args.src_dir, split, cls), os.path.join(args.dst_dir, split, cls), dirs_exist_ok=True)
print(f'### Created imagenet-{args.K} {split} ###')
Excuse me, does the line23 mean that the validation set of randomly selected imagenet-100 is the corresponding 100 classes of selected train set, which means the validation set of Imagenet has been already classified? Thanks!
Hi! Here the in-distribution validation set is selected to measure the ID classification performance, which needs to share the same set of classes as the training set.
Got it! And could you provide the script of evaluating ImageNet100? Such as what scores you use(KNN or Maha) and the K value. Thanks a lot!
Hi! The same script (eval_ood.py) can be used for evaluating ImageNet100, just specify --in_dataset as 'ImageNet-100' (and the corresponding hyperparameters used for finetuning ImageNet-100). We use KNN as default score as shown in the paper but Maha also works well.