pytorch/data

Roadmap for mixed chain of multithread and multiprocessing pipelines?

npuichigo opened this issue ยท 2 comments

๐Ÿš€ The feature

pypeln has a nice feature to chain pipelines which may run on different kind of workers including process, thread or asyncio.

data = (
    range(10)
    | pl.process.map(slow_add1, workers=3, maxsize=4)
    | pl.thread.filter(slow_gt3, workers=2)
    | pl.sync.map(lambda x: print x)
    | list
)

image

I remembered that in the first proposal of pytorch/data, it claims to support something alike. I'd like to ask if it's still planed and the concrete roadmap.

Motivation, pitch

Initial proposed

Alternatives

No response

Additional context

No response

ejguan commented

Sorry for the late response. TBH, this has been in our long-term roadmap when we createdTorchData project. But, unfortunately, me and @NivekT are not working on TorchData anymore. Stay tuned on the update later.