dashbitco/flow

Unexpected parallelism with Flow.from_enumerables

lackac opened this issue · 1 comments

Looking through the code I found a theoretical issue or at least a potential source of confusion.

Let's look at an example:

enums_with_urls
|> Flow.from_enumerables(stages: 4, max_demand: 1)
|> Flow.map(&fetch_url/1)
|> Flow.run()

After looking at this code one would expect to have 4 processes that do the job of fetching urls from a particular API. That might be important because of throttling or just to prevent flooding a service.

However, if the input list contains more than 4 enumerables Flow will trust the job of the mapper operations to the generated GenStage.Streamer processes. Let's imagine that the input is generated from a directory listing where we have 100 files, each mapped to a Stream with File.stream!/3. Now we have 100 processes each fetching pages concurrently from some service that is suddenly getting more heat than expected. :)

I think we should either highlight this case in the documentation of Flow.from_enumberables/2 or remove this clause from the case expression.

Good point, let's remove the clause.