DavisVaughan/furrr

Unbalanced Load distribution on first run

Opened this issue · 0 comments

Hello,

I'm experiencing an issue with the load distribution of parallel jobs executed with future_map. In particular, I observe that the first time I call future_map the workload is happening on 2-3 workers only, while on consecutive runs of the same call the workload is shared evenly across the workers.
I tried to narrow it down into a reprex:

library(future)
library(tictoc)

plan(multisession, workers = 10)

tic()
res <- purrr::map(
  .x = 1:1e6, .f = ~.x +1
)
toc()
# 1.92 sec not in paralle

tic()
furrr::future_map(
  .x = 1:1e6, .f = ~.x +1
)
toc()
# 3.462 sec on first run

microbenchmark::microbenchmark(
  {
    furrr::future_map(
      .x = 1:1e6, .f = ~.x +1
    )
  },
  times = 20
)
# 1.2 secs on average on consecutive runs

In my "real-world" applications, where there is also a considerable amount of data to be passed to the workers, this tends to be more extreme.

This might be related to this previous issue:
#3

I'm working on the following system:

R version 4.2.2 Patched (2022-11-10 r83330)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 22.04.1 LTS

Any thoughts appreciated.

Best,
Maximilian