is it possible to set num_workers in DataLoader with tfrecord?
TaoZappar opened this issue · 2 comments
Hi, there,
I was wonering it is possible to set num_workers in DataLoader with tfrecord? when I set num_workder = 4, the data is repeated 4 times.
@TaoZappar
"When using an IterableDataset with multi-process data loading. The same dataset object is replicated on each worker process, and thus the replicas must be configured differently to avoid duplicated data. See IterableDataset documentations for how to achieve this."
https://pytorch.org/docs/stable/data.html#iterable-style-datasets
TFRecordDataset will only load and return the data once. MultiTFRecordDataset isn't guaranteed to return the dataset once. MultiTFRecordDataset interleaves multiple datasets together with given sampling ratios so it's not possible to respect both constraints.