pytorch/data

FileExistsError when using `on_disk_cache` and multiple workers

knoriy opened this issue ยท 1 comments

knoriy commented

๐Ÿ› Describe the bug

When multiple workers try to create a folder structure at the same time it fails and raises FileExistsError.

Expected behavior: create folder structure and continue

torchdata.datapipes.iter.IterableWrapper(urls)\
        .on_disk_cache(filepath_fn=filepath_fn)\
        .open_files_by_fsspec(mode='rb')\
        .end_caching(filepath_fn=filepath_fn)
        
dl = DataLoader(datapipe, batch_size=16, num_workers=96, persistent_workers=True)

Versions

torchdata==0.6.1

knoriy commented

I've opened a pull request which fixes this issue