Mux with MPRS causes operations after sharding_round_robin_dispatcher to run on the same worker
JohnHBrock opened this issue ยท 3 comments
๐ The doc issue
This doesn't seem to be mentioned in the docs, but if you have two datapipes that use sharding_round_robin_dispatcher
and then mux
them together:
- Any steps between
sharding_round_robin_dispatcher
andmux
will take place on the same worker process. - Only the steps after the
mux
will take place on separate workers.
For example, with the below graph, the Mapper
nodes in between the ShardingRoundRobinDispatcher
nodes and Multiplexer
run on the same worker process. The Mapper
node after Multiplexer
will run across multiple processes as they're fed data in a round-robin fashion.
My incorrect expectation was that the dispatching process would distribute data to worker processes immediately after sharding_round_robin_dispatch
as usual, and then everything after mux
would take place on either one or multiple worker processes.
Suggest a potential alternative/fix
The documentation for Multiplexer
, ShardingRoundRobinDispatcher
, and/or MultiProcessingReadingService
should be updated to clarify what the intended behavior is here.
I am sorry that I think we currently don't support two ShardingRoundRobinDispatcher
I think that's worth putting in the docs -- I just looked and couldn't find a mention of that limitation.
I am sorry that I think we currently don't support two
ShardingRoundRobinDispatcher
This should potentially be taken into consideration as a usecase with regards to #1174