A question of preprocessing the imagenet dataset

Question

A question of preprocessing the imagenet dataset

shengwubin opened this issue 3 years ago · 2 comments

Hi,

I am very interested in your ProtoNCE paper and I tried to run the unsupervised training example in your readme file. However, I got stuck when loading the imagenet dataset while training.

The problem is that I cannot find any code to generate the train folder:

# Data loading code
    traindir = os.path.join(args.data, 'train')
    normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406],
                                     std=[0.229, 0.224, 0.225])
    
    if args.aug_plus:
        # MoCo v2's aug: similar to SimCLR https://arxiv.org/abs/2002.05709
        augmentation = [
            transforms.RandomResizedCrop(224, scale=(0.2, 1.)),
            transforms.RandomApply([
                transforms.ColorJitter(0.4, 0.4, 0.4, 0.1)  # not strengthened
            ], p=0.8),
            transforms.RandomGrayscale(p=0.2),
            transforms.RandomApply([pcl.loader.GaussianBlur([.1, 2.])], p=0.5),
            transforms.RandomHorizontalFlip(),
            transforms.ToTensor(),
            normalize
        ]
    else:
        # MoCo v1's aug: same as InstDisc https://arxiv.org/abs/1805.01978
        augmentation = [
            transforms.RandomResizedCrop(224, scale=(0.2, 1.)),
            transforms.RandomGrayscale(p=0.2),
            transforms.ColorJitter(0.4, 0.4, 0.4, 0.4),
            transforms.RandomHorizontalFlip(),
            transforms.ToTensor(),
            normalize
        ]
        
    # center-crop augmentation 
    eval_augmentation = transforms.Compose([
        transforms.Resize(256),
        transforms.CenterCrop(224),
        transforms.ToTensor(),
        normalize
        ])    
       
    train_dataset = pcl.loader.ImageFolderInstance(
        traindir,
        pcl.loader.TwoCropsTransform(transforms.Compose(augmentation)))
    eval_dataset = pcl.loader.ImageFolderInstance(
        traindir,
        eval_augmentation)

I used your code to download the whole VOC2007 dataset and extract it to the folder VOCdevkit. However, the variable traindir says that I have to have a folder train in the VOC2007 dataset folder. So where does this folder come from?

Best,
Wubin

Answer 1 · 2021-12-15T09:34:36.000Z

Hi,
Can I know if you are referring to ImageNet or VOC? The code you show above is for ImageNet, the dataset for VOC has different code.

Answer 2 · 2021-12-21T07:23:56.000Z

@LiJunnan1992 Thanks for the reply. I prefer the VOC dataset.