EarthNets/Dataset4EO

DistributedSampler with IterDataPipe

Closed this issue · 2 comments

Hi!

I am unable to use distributed.DistributedSampler with DataLoader. It seems like IterDataPipe is incompatible with DistributedSampler.

I attach a code snippet. The last line, using the distributed sampler for the dataloader gives the error below.

train_dataset = bigearthnet.BigEarthNet(args.data, split='train', skip_integrity_check=True)

train_sampler = torch.utils.data.distributed.DistributedSampler(train_dataset)

train_loader = torch.utils.data.DataLoader(
        train_dataset, batch_size=args.batch_size, shuffle=(train_sampler is None),
        num_workers=args.workers, pin_memory=True, sampler=train_sampler)
ValueError: DataLoader with IterableDataset: expected unspecified sampler option,
but got sampler=<torch.utils.data.distributed.DistributedSampler object at 0x7fbbe1e616d0>

Any help or guidance is highly appreciated! Thanks in advance!

Hi @miquel-espinosa,

First, as far as I know, DistributedSampler is not supported for the IterableDataset. Please refer to this question.

To work with the BigEarthNet dataset, we highly recommend you to use our code in RSI-Classification repo. It works with Dataset4EO and supports DDP as well.

If you still need a working example of using IterDataPipe with DistributedSampler support (by shading data manully). We will try to add a working example ASAP.

Hi @ShadowXZT,
Thanks for all the information. I will have a look at the RSI-Classification repository.