work with a list of data frames

Question

work with a list of data frames

Closed this issue 3 years ago · 4 comments

I use purrr::map() to map each data frame in a list to a function, it seems that future_map() cannot handle this situation. make_chunks() cannot split the list into even parts.

I would like to know that is there any workaround to make use of parallel computing in my particular situation?

Answer 1 · 2021-09-23T13:30:00.000Z

Can you please provide a full reproducible example that uses map(), but not with future_map()? I am not sure what you are talking about unfortunately.

If you've never heard of a reprex before, you might want to start by reading the tidyverse.org help page.

You can install reprex by running (you may already have it, though, if you have the tidyverse package installed):

install.packages("reprex")

Thanks

Answer 2 · 2021-09-23T13:50:19.000Z

Thanks for your advice and sorry about the bad example. Here is an example by reprex and I hope it is a proper place to discuss. The background is that I want to run my simulation with various sets of parameters by using purrr::map(). I replaced purrr::map() with furrr::future_map() to try to make use of all my 8 cpu cores, but it seems that the performance is not significantly improved.

What could be the problem here? Could it because of the data structure of testcombo?

devtools::install_github("rsetienne/DDD@tianjian_Rampal")
#> Skipping install of 'DDD' from a github remote, the SHA1 (11629491) has not changed since last install.
#>   Use `force = TRUE` to force installation
devtools::install_github("EvoLandEco/eve")
#> Skipping install of 'eve' from a github remote, the SHA1 (8cc5ea66) has not changed since last install.
#>   Use `force = TRUE` to force installation

testcombo <- eve::edd_combo_maker(
  la = c(0.5, 0.3),
  mu = c(0.1, 0.2),
  beta_n = -0.0001,
  beta_phi = -0.0001,
  gamma_n = 0.0001,
  gamma_phi = 0.0001,
  age = c(3, 5),
  model = "dsde2",
  metric = c("ed", "pd"),
  offset = "none"
)

future_opts <- furrr::furrr_options(seed = TRUE)

testfuna <- function(testcombo, future_opts) {
  future::plan(future::sequential)
  furrr::future_map(
    .x = testcombo,
    .f = eve::edd_wrapper,
    .options = future_opts,
    nrep = 3,
    make_plot = FALSE,
    make_stat = FALSE,
    plot_opt = NULL,
    stat_opt = NULL
  )
}

testfunb <- function(testcombo, future_opts) {
  future::plan(future::multisession, workers = 8)
  furrr::future_map(
    .x = testcombo,
    .f = eve::edd_wrapper,
    .options = future_opts,
    nrep = 3,
    make_plot = FALSE,
    make_stat = FALSE,
    plot_opt = NULL,
    stat_opt = NULL
  )
}

microbenchmark::microbenchmark(testfuna(testcombo, future_opts),
                               testfunb(testcombo, future_opts),
                               times = 5L)
#> Unit: seconds
#>                              expr      min       lq     mean   median       uq
#>  testfuna(testcombo, future_opts) 7.947545 8.496104 9.475211 9.262720 10.11764
#>  testfunb(testcombo, future_opts) 6.920686 7.159854 8.402481 7.214634 10.26396
#>       max neval
#>  11.55204     5
#>  10.45327     5

^{Created on 2021-09-23 by the reprex package (v2.0.1)}

Answer 3 · 2021-09-23T17:24:03.000Z

I further tested larger simulations, it seems that the performances are much better in larger simulations

Answer 4 · 2021-09-23T17:36:53.000Z

That is good to see. You also have to consider that:

It takes time to actually send data to and from the workers. So with small tests that often dominates the timing.
Starting up the workers themselves takes some time. i.e. this line is in your benchmark, but starting up the 8 workers probably takes 2-3 seconds future::plan(future::multisession, workers = 8)