vahidk/tfrecord

is it possible to set num_workers in DataLoader with tfrecord?

TaoZappar opened this issue · 2 comments

Hi, there,

I was wonering it is possible to set num_workers in DataLoader with tfrecord? when I set num_workder = 4, the data is repeated 4 times.

@TaoZappar
"When using an IterableDataset with multi-process data loading. The same dataset object is replicated on each worker process, and thus the replicas must be configured differently to avoid duplicated data. See IterableDataset documentations for how to achieve this."

https://pytorch.org/docs/stable/data.html#iterable-style-datasets

TFRecordDataset will only load and return the data once. MultiTFRecordDataset isn't guaranteed to return the dataset once. MultiTFRecordDataset interleaves multiple datasets together with given sampling ratios so it's not possible to respect both constraints.