
NotImplementedError for __len__ function in SafeDataLoader

sannawag opened this issue · 8 comments

Hi @msamogh,

Having a __len__ function in SafeDataLoader, identical to the one in, would be very helpful. I currently get the following error:

dataloader = nc.SafeDataLoader(dataset)
if i == len(dataloader):
File "envs/deep/lib/python3.6/site-packages/torch/utils/data/", line 504, in __len__
return len(self.batch_sampler)
File "envs/deep/lib/python3.6/site-packages/torch/utils/data/", line 150, in __len__
return (len(self.sampler) + self.batch_size - 1) // self.batch_size
File "envs/deep/lib/python3.6/site-packages/torch/utils/data/", line 20, in __len__
raise NotImplementedError

Thank you,

Hi @sannawag,
Can you tell me a bit more of what you are trying to do? It would help me understand your situation better so that I can help you.

@msamogh, given the large size of my training dataset, I wish to validate more often than once per epoch. For this reason, when enumerating the dataloader, I check whether the index equals the length of the dataloader - 1. I do not directly have access to the dataset length because I initialize the dataloaders in a separate function.

So if I understand you correctly, you wish to enumerate through a single DataLoader in a nested fashion?

I do wish to enumerate through DataLoaders in a nested fashion, but one is a built from a training set, the other from a validation set.

So what's preventing you from enumerating through the validation set in the usual way (using enumerate())? You can reinitialize your validation set DataLoader inside the loop as many times you want.

Here is the basic structure I am trying to obtain:

for i, sample in enumerate(training_dataloader):
    # process the training sample
    if i % step == 0 or i == len(training_dataloader) - 1:

The catch is computing len(training_dataloader).


Ah, I see. The bad news is you can't call len on your DataLoader (unless you're okay with setting eager_eval to True on your dataset). The good news is, in this case, is that you can simply move the part where you check if it's the last iteration to outside the loop (once it has ended).

This is because without actually checking every element, SafeDataLoader has no way of telling what the effective number of valid samples in your dataset is going to be. So one of the things you will have to give up with SafeDataLoader is the ability to call len().

Hope that helps!

Thank you @msamogh, this makes perfect sense.