msamogh/nonechucks

Should SafeDataset drop __getitem__ and inherrit IterableDataset?

rehno-lindeque opened this issue · 1 comments

I quickly looked under the hood of this library because I needed to handle None values in my own dataset, but felt suspicious that this is trying to do something impossible.

Looking at https://github.com/msamogh/nonechucks/blob/master/nonechucks/dataset.py#L87-L96, I am under the impression that __getitem__ will return the same value for multiple indices. E.g. suppose index 2 is None, then dataset[2] == dataset[3].

Surely that doesn't make sense for a well-behaved map-style dataset?

Alternatively indices could be remapped via a Dict[int,int] for random access.

Yes, this is not the behavior I expected but is indeed what happens.