msamogh/nonechucks

SafeDataset cann't wrapper different dataset

eejackliu opened this issue · 11 comments

trainset_ = my_data((140,224),transform=image_transform)
testset_ = my_data((140,224),image_set='val',transform=image_transform)

trainset = nc.SafeDataset(trainset_)
testset = nc.SafeDataset(testset_)

Before executing trainset=nc.SafeDataset(trainset_), I try to plot the first image in test by (plt.imshow(testset_[0][0].permute(1,2,0)), so I got the first image in valset which is right. After execute trainset=nc.SafeDataset(trainset_) , the (plt.imshow(testset_[0][0].permute(1,2,0)) shows me the first image in trainset which is wrong. I tried to check the image path in testset object, it' still the the path of first image in valset. Could you give me some suggestion to solve this problem?

from PIL import Image as image
import torch
import numpy as np
import torchvision.transforms.functional as TF
import torchvision.transforms as transforms
import os
import torchvision.datasets as dset
voc_colormap = [[0, 0, 0], [128, 0, 0], [0, 128, 0], [128, 128, 0],
                [0, 0, 128], [128, 0, 128], [0, 128, 128], [128, 128, 128],
                [64, 0, 0], [192, 0, 0], [64, 128, 0], [192, 128, 0],
                [64, 0, 128], [192, 0, 128], [64, 128, 128], [192, 128, 128],
                [0, 64, 0], [128, 64, 0], [0, 192, 0], [128, 192, 0],
                [0, 64, 128]]
class my_data(torch.utils.data.Dataset):
    #if target_transform=mask_transform is
    def __init__(self,data_size,root='data',image_set='train',transform=None,target_transform=None):
        self.shape=data_size
        self.root=os.path.expanduser(root)
        self.transform=transform
        self.target_transform=target_transform
        self.image_set=image_set
        voc_dir=os.path.join(self.root,'VOCdevkit/VOC2012')
        image_dir=os.path.join(voc_dir,'JPEGImages')
        mask_dir=os.path.join(voc_dir,'SegmentationClass')
        splits_dir=os.path.join(voc_dir,'ImageSets/Segmentation')
        splits_f=os.path.join(splits_dir, self.image_set + '.txt')
        with open(os.path.join(splits_f),'r') as f:
            file_name=[x.strip() for x in f.readlines()]
        self.image=[os.path.join(image_dir,x+'.jpg') for x in file_name]
        self.mask=[os.path.join(mask_dir,x+'.png') for x in file_name]
        assert (len(self.image)==len(self.mask))

        self.class_index=np.zeros(256**3)
        for i,j in enumerate(voc_colormap):
            tmp=(j[0]*256+j[1])*256+j[2]
            self.class_index[tmp]=i
    def __getitem__(self, index):
        img=image.open(self.image[index]).convert('RGB')
        target=image.open(self.mask[index]).convert('RGB')
        i,j,h,w=transforms.RandomCrop.get_params(img,self.shape)
        # if i<0 or j<0 or h <0 or w<0:
        #     return None,None
        img=TF.crop(img,i,j,h,w)
        target=TF.crop(target,i,j,h,w)
        if  self.target_transform is not None:
            return self.transform(img),self.target_transform(target)
        target=np.array(target).transpose(2,0,1).astype(np.int32)
        target=(target[0]*256+target[1])*256+target[2]
        target=self.class_index[target]
        return self.transform(img),target

    def __len__(self):
        return len(self.image)

I tried to check the image path in testset object, it' still the the path of first image in valset.

How did you check this?

Also, are you sure that something like this isn't happening?
Maybe the first (or first few) element(s) of your training set is invalid, and nonechucks has dropped them, and is showing you the first valid image, which might be the same as the first image of the test dataset?

This is a pretty far-fetched case, but just want to confirm.

I tried to check the image path in testset object, it' still the the path of first image in valset.

How did you check this?
I try to watch the attribute self.image in object testset, also plot the image by plt.imshow(testset_[0][0].permute(1,2,0)

Also, are you sure that something like this isn't happening?
Maybe the first (or first few) element(s) of your training set is invalid, and nonechucks has dropped them, and is showing you the first valid image, which might be the same as the first image of the test dataset?

This is a pretty far-fetched case, but just want to confirm.

I try to plot the image testset_[0][0] which is the original dataset without be wrapped . After
the first wrapper was executed trainset=nc.SafeDataset(trainset_) ,the testset_ [0][0] shows different image which should be in the trainset

Can you try and look at a few more images from the wrapped test set to confirm whether they are indeed all coming from the train set?

Can you try and look at a few more images from the wrapped test set to confirm whether they are indeed all coming from the train set?

I have double check the image by other index, the testset_ still got the wrong answer after this line was executed trainset = nc.SafeDataset(trainset_)
I have pasted pasted the code of my_data , I used the pascal voc 2012 dataset for segmentation. Actually the only reason that could lead to exception was the crop function in getitem . When I use the crop parameter which definitely lead to exception , I still get the same answer. Could you help me point out what is wrong in my code? The codes work fine when use the official dataset of pytorch ,but I have to handle the exception when the crop size is bigger than image.Thanks for your wrapper!

Okay, since I don't have access to your running code, I might have to ask you to do a couple of tests to find out the exact cause.

Could you try the following?

  1. Switch the order in which you call the wrappers around the train and test datasets
  2. Comment out the initialization of the train set and then test it

Thanks!

After switch the order trainset and testset ,all the images comes from testset .Comment out the trainset ,then I got images from testset. It seems that I can only get images from the first wrapped dataset . Is this right?

trainset_=my_data((112,196),transform=image_transform)
testset_=my_data((112,196),image_set='val',transform=image_transform)

trainset=nc.SafeDataset(trainset_)  # switch this line and the next line
testset=nc.SafeDataset(testset_)
trainloader=nc.SafeDataLoader(trainset,batch_size=4)
testloader=nc.SafeDataLoader(testset,batch_size=4)
a,b=next(iter(trainloader))
c,d=next(iter(testloader))
a=torchvision.utils.make_grid((c.permute(0,2,3,1)*torch.tensor((0.229, 0.224, 0.225))+torch.tensor((0.485, 0.456, 0.406))).permute(0,3,1,2),nrow=4)
plt.imshow(a.permute(1,2,0))
plt.show()

@eejackliu I think I have narrowed down the issue to the memoization used in SafeDataset. I've been quite busy over the week and will try to look into it as soon as possible.

Can you verify if the latest commit fixes the problem? 37eee52

Can you verify if the latest commit fixes the problem? 37eee52

Yes! It works! Thanks for your repo!