pytorch/data

Caching doesn't work with cycle

Modexus opened this issue ยท 0 comments

๐Ÿ› Describe the bug

When cycle with later caching is used it works the very first time but afterwards it crashes because of the demux because it checks infinitely for todo files but all are cached.

from torchdata.datapipes.iter import IterableWrapper

dp = IterableWrapper(["test"])
dp = dp.cycle()
dp = dp.on_disk_cache(filepath_fn=lambda x: f"./{x}")
dp = dp.map(lambda x: (x, x))
dp = dp.end_caching(mode="t", same_filepath_fn=True)

next(iter(dp))
next(iter(dp))

This happens because of how demux works so the same applies for cycle and demux:

from torchdata.datapipes.iter import IterableWrapper

dp = IterableWrapper(["test"])
dp = dp.cycle()
dp0, dp1 = dp.demux(2, lambda x: 1)

next(iter(dp0))

Versions

torchdata.version=='0.7.0a0+deeacb4'