msamogh/nonechucks

KeyError:(<function SafeDataset.__getitem__ at ...>)

minushuang opened this issue · 2 comments

Hi, I used to create a dataset use SafeDataset from csv file, but failed with the error of
KeyError:(<function SafeDataset.__getitem__ at ...>)
Detalis

Traceback (most recent call last):
  File "test.py", line 77, in <module>
    main()
  File "test.py", line 59, in main
    for batch in test_loader:
  File "/home/hotel_ai/python3/lib/python3.5/site-packages/torch/utils/data/dataloader.py", line 582, in __next__
    return self._process_next_batch(batch)
  File "/home/hotel_ai/python3/lib/python3.5/site-packages/torch/utils/data/dataloader.py", line 608, in _process_next_batch
    raise batch.exc_type(batch.exc_msg)
IndexError: Traceback (most recent call last):
  File "/home/hotel_ai/python3/lib/python3.5/site-packages/nonechucks/utils.py", line 49, in __call__
    res = cache[key]
KeyError: (<function SafeDataset.__getitem__ at 0x7f81b186b950>, (0,), frozenset())

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/hotel_ai/python3/lib/python3.5/site-packages/torch/utils/data/_utils/worker.py", line 99, in _worker_loop
    samples = collate_fn([dataset[i] for i in batch_indices])
  File "/home/hotel_ai/python3/lib/python3.5/site-packages/torch/utils/data/_utils/worker.py", line 99, in <listcomp>
    samples = collate_fn([dataset[i] for i in batch_indices])
  File "/home/hotel_ai/python3/lib/python3.5/site-packages/nonechucks/utils.py", line 51, in __call__
    res = cache[key] = self.func(*args, **kw)
  File "/home/hotel_ai/python3/lib/python3.5/site-packages/nonechucks/dataset.py", line 96, in __getitem__
    raise IndexError
IndexError

and here is my code

test_set = ImageSet('./test.csv', test_trainsforms)
test_set = nc.SafeDataset(test_set)

ImageSet code, open image from http source

class ImageSet(data.Dataset):
    def __init__(self, data_txt,data_transforms):
        f = open(data_txt, "r")
        data_list=[]
        #label_list = []
        cnt = 0
        lines = f.readlines()
        for line in lines[1:]:
            cnt += 1
            tmp = line.strip().split(',')
            data_path = tmp[1]
            data_list.append(data_path)
        f.close()
        self.data_list = data_list
        self.transforms = data_transforms


    def __getitem__(self, index):
        url_prefix = 'this is a http-url-prefix such as: http://images.baidu.com/'

        data_path = self.data_list[index]

        file0 = urllib.request.urlopen(url_prefix + data_path)
        image_file0 = io.BytesIO(file0.read())
        data = Image.open(image_file)
        if data.mode != 'RGB':
            data = data.convert("RGB")

        data = self.transforms(data)
        
       return data, data_path
        
    def __len__(self):
        return len(self.data_list)

sorry, my fault, I used the defalut DataLoader in my code. replaced with SafeDataLoader and solved my problem. but I have another question, the performance seems to be not very good in my case. 215 seconds 5000 images with resnet50

Can you show me how you are initializing your (Safe)DataLoader? Would be helpful to see if you are using multiple workers, etc.