tweag/funflow

Question: Concurrent Flows?

Opened this issue · 3 comments

If step 1 gets 100 pieces of data, and I need to execute a flow for each one, does funflow support running these concurrently?

Or would I need to write a separate program that kicks off 100 flows?

How would you handle this?

Hi @seanhess, the flow runners provided with the project (runFlow or runFlowWithConfig) will indeed run independent branches of the flow's DAG in parallel (src). Under the hood we use kernmantle's performP function which is ultimately built on top of the async package. The main limitation is that funflow only supports multithreaded and not distributed execution, so those 100 tasks would need to be able to be processed by a single machine.

That's great, thanks! Multithreaded should provide plenty of performance, I suspect by the time I need a whole cluster it might make sense to switch to an event bus anyway.

In real life, my process needs to run a long flow for each item, which never converge. So the real flow is something that happens once per item. In a real-world application, would you include the first step (collecting the items) in the flow, such that the huge fanout is a part of it? Or would you have collection be a separate function / program, and start the flow with the long process per item?

Hi @seanhess, apologies for the delayed response. I am not actually sure whether there would be an advantage to one approach over the other. In the case of one flow with many parallel DAG branches, the branches should be processed in parallel as noted above. In the case of launching many flows in parallel from a driver program, I think the main concern would be how the driver program executes each flow and whether they are launched in parallel.

My intuition would be to start by trying the first option since that is the primary way we have used funflow and should hopefully work for your use case out of the box. Please feel free to open an issue if you run into any problems with that approach.