DistributedSampler with IterDataPipe
Closed this issue · 2 comments
Hi!
I am unable to use distributed.DistributedSampler
with DataLoader
. It seems like IterDataPipe
is incompatible with DistributedSampler
.
I attach a code snippet. The last line, using the distributed sampler for the dataloader gives the error below.
train_dataset = bigearthnet.BigEarthNet(args.data, split='train', skip_integrity_check=True)
train_sampler = torch.utils.data.distributed.DistributedSampler(train_dataset)
train_loader = torch.utils.data.DataLoader(
train_dataset, batch_size=args.batch_size, shuffle=(train_sampler is None),
num_workers=args.workers, pin_memory=True, sampler=train_sampler)
ValueError: DataLoader with IterableDataset: expected unspecified sampler option,
but got sampler=<torch.utils.data.distributed.DistributedSampler object at 0x7fbbe1e616d0>
Any help or guidance is highly appreciated! Thanks in advance!
Hi @miquel-espinosa,
First, as far as I know, DistributedSampler is not supported for the IterableDataset. Please refer to this question.
To work with the BigEarthNet dataset, we highly recommend you to use our code in RSI-Classification repo. It works with Dataset4EO and supports DDP as well.
If you still need a working example of using IterDataPipe with DistributedSampler support (by shading data manully). We will try to add a working example ASAP.
Hi @ShadowXZT,
Thanks for all the information. I will have a look at the RSI-Classification repository.