Nested Futures Use More Memory Than They Should

Question

Nested Futures Use More Memory Than They Should

Opened this issue a year ago · 2 comments

I've been running code with nested loops that keeps running into issues with memory usage and I have been trying to come up with a small example that potentially shows the problem. In the example I am just taking a random square matrix and creating a list of the columns. Obviously you wouldn't use a double loop to do this in R but it is hopefully a simple and clear example that shows when using purrr the double loop doesn't increase memory usage while with furrr and future.apply the memory usage explodes.

library(bench)
library(furrr)
library(future.apply)
library(purrr)

# purrr
single_loop <- function(x, n) {
  map(1:n, ~ x[, .x])
}

# future.apply
single_loop_a <- function(x, n) {
  future_lapply(1:n, FUN = function(i) x[, i])
}

# furrr
single_loop_f <- function(x, n) {
  future_map(1:n, ~ x[, .x])
}

# purrr
inner_loop <- function(i, n, x = x) {
  map_dbl(1:n, ~ x[.x, i])
}

outer_loop <- function(x, n) {
  map(1:n, ~ inner_loop(.x, n, x = x))
}

# future.apply
inner_loop_a <- function(i, n, x = x) {
  future_sapply(1:n, FUN = function(j) x[j, i])
}

outer_loop_a <- function(x, n) {
  future_lapply(1:n, FUN = function(i) inner_loop_a(i, n, x))
}

# furrr
inner_loop_f <- function(i, n, x = x) {
  future_map_dbl(1:n, ~ x[.x, i])
}

outer_loop_f <- function(x, n) {
  future_map(1:n, ~ inner_loop_f(.x, n, x = x))
}

n <- 100
x <- matrix(rnorm(n * n), nrow = n)

identical(single_loop(x, n), single_loop_f(x, n))
identical(single_loop(x, n), single_loop_a(x, n))
identical(single_loop(x, n), outer_loop(x, n))
identical(single_loop(x, n), outer_loop_a(x, n))
identical(single_loop(x, n), outer_loop_f(x, n))
# All return TRUE

plan(sequential)

# With a single loop memory usage is similar
bench::mark(single_loop(x, n))$mem_alloc
# 127KB
bench::mark(single_loop_a(x, n))$mem_alloc
# 243KB
bench::mark(single_loop_f(x, n))$mem_alloc
# 340KB

# With a double loop memory usage remains similar for purrr, but explodes 
# on the other two
bench::mark(outer_loop(x, n))$mem_alloc
# 83.6KB
bench::mark(outer_loop_a(x, n))$mem_alloc
# 11.8MB
bench::mark(outer_loop_f(x, n))$mem_alloc
# 21.1MB

# Try again with a larger matrix
n <- 5000
x <- matrix(rnorm(n * n), nrow = n)

bench::mark(single_loop(x, n))$mem_alloc
287MB
bench::mark(single_loop_a(x, n))$mem_alloc
287MB
bench::mark(single_loop_f(x, n))$mem_alloc
287MB

bench::mark(outer_loop(x, n))$mem_alloc
191MB
bench::mark(outer_loop_a(x, n))$mem_alloc
2.88GB
bench::mark(outer_loop_f(x, n))$mem_alloc
1.57GB

As you can see, using the double loop actually decreases memory usage for purrr, although it stays very similar, but causes memory usage to explode for furrr and future.apply. I ran this example on a 2023 MacBook, but the actual code that I am trying to fix has been running on a Linux cluster. I ran this example using furrr and future.apply because yesterday I logged a bug report about nested loops using future.callr and @HenrikBengtsson pointed out that it was only an issue with furrr. Please let me know if there is any additional information I can provide or help I can give in solving this issue and thanks for the wonderful collection of packages!

Answer 1 · 2024-01-05T13:58:33.000Z

A little more information. I don't know much about memory profiling, so apologies if this is not the best way to present the information, but in the hopes it might be helpful...

library(profmem)
library(tidyverse)

n <- 100
x <- matrix(rnorm(n * n), nrow = n)

single_loop(x, n) |>
  profmem() |>
  filter(what == "alloc") |>
  summarise(
    allocs = n(),
    total_bytes = sum(bytes)
  )
#   allocs total_bytes
# 1    201      130448

single_loop_a(x, n) |>
  profmem() |>
  filter(what == "alloc") |>
  summarise(
    allocs = n(),
    total_bytes = sum(bytes)
  )
#   allocs total_bytes
# 1    364      251360

single_loop_f(x, n) |>
  profmem() |>
  filter(what == "alloc") |>
  summarise(
    allocs = n(),
    total_bytes = sum(bytes)
  )
#   allocs total_bytes
# 1    475      353128

outer_loop(x, n) |>
  profmem() |>
  filter(what == "alloc") |>
  summarise(
    allocs = n(),
    total_bytes = sum(bytes)
  )
#   allocs total_bytes
# 1    101       85648

outer_loop_a(x, n) |>
  profmem() |>
  filter(what == "alloc") |>
  summarise(
    allocs = n(),
    total_bytes = sum(bytes)
  )
#   allocs total_bytes
# 1  17101    12623144

outer_loop_f(x, n) |>
  profmem() |>
  filter(what == "alloc") |>
  summarise(
    allocs = n(),
    total_bytes = sum(bytes),
  )
#   allocs total_bytes
# 1  27812    22595240

Answer 2 · 2024-10-10T16:29:26.000Z

Update: I fixed my problem. Turns out I was not familiar with how future handles environments. See https://furrr.futureverse.org/articles/gotchas.html and https://furrr.futureverse.org/articles/carrier.html

I think I ran into the same problem or at least a very similar problem. Apologies for the somewhat convoluted data reconstruction but it's a simulation of the data that I used when I first encountered it. Here's my reprex:

library(future)
library(furrr)
library(purrr)

logistic_model <- function(feature, df_other_vars, formula) {
  df <- dplyr::bind_cols(df_other_vars, "x" = feature)
  
  m <- glm(formula(formula), 
           data = df, 
           family = binomial(logit))
  
  return(m)
}

nested_map <- function(imputed_versions_feature, ...) {
  models <- imputed_versions_feature |>
    purrr::map(\(imputed_version_feature) 
               logistic_model(feature = imputed_version_feature, ...))
  
  return(models[1]) # originally mice::pool call, but not necessary for demonstration
}

gen_names <- function(n = 1) {
  mz <- runif(min = 10, max = 200, n = n) |> signif(7)
  rt <- runif(min = 0, max = 12, n = n) |> signif(7)
  string <- glue::glue("X{mz}_{rt}")
  return(string)
}

gen_x <- function(dummy, nr_imputations = 60, n = 1000) {
  x <- replicate(nr_imputations, rnorm(n)) |> tibble::as_tibble() 
}

list_of_feature_dfs <- gen_names(1024) |> 
  tibble::as_tibble() |> 
  tidyr::pivot_wider(names_from = value) |> 
  purrr::map(gen_x) 

df <- tibble::tibble(y = rbinom(1000, 1, 0.5))

seed <- 1309
set.seed(seed)
furrr_options <- furrr::furrr_options(seed = seed)
future::plan(future::multisession, workers = 16)

# no  problems
r <- list_of_feature_dfs |> 
  furrr::future_map(\(feature) nested_map(imputed_versions_feature = feature, 
                                          df_other_vars = df,
                                          formula = 'y ~ x'),
                    .progress = TRUE,
                    .options = furrr_options)

# same as above but via function call: cpu's never really get going, memory keeps ever increasing - doesn't finish
wrapper <- function(list_of_feature_dfs, formula, df_other_vars, furrr_options) {
  results <- list_of_feature_dfs %>% 
    furrr::future_map(\(feature) nested_map(imputed_versions_feature = feature, 
                                            df_other_vars = df_other_vars,
                                            formula = formula),
                      .progress = TRUE,
                      .options = furrr_options)
  return(results)
}

r_function <- wrapper(list_of_feature_dfs, 'y ~ x', df, furrr_options = furrr_options)

Session info:

> sessionInfo()
R version 4.3.2 (2023-10-31 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows Server 2019 x64 (build 17763)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252 
[2] LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

time zone: Etc/UTC
tzcode source: internal

attached base packages:
  [1] stats     graphics  grDevices utils     datasets  methods  
[7] base     

other attached packages:
  [1] tibble_3.2.1  tidyr_1.3.1   dplyr_1.1.4   purrr_1.0.2  
[5] furrr_0.3.1   future_1.34.0

loaded via a namespace (and not attached):
  [1] digest_0.6.37     utf8_1.2.4        R6_2.5.1         
[4] codetools_0.2-19  tidyselect_1.2.1  magrittr_2.0.3   
[7] glue_1.8.0        parallel_4.3.2    pkgconfig_2.0.3  
[10] generics_0.1.3    lifecycle_1.0.4   cli_3.6.3        
[13] fansi_1.0.6       parallelly_1.38.0 vctrs_0.6.5      
[16] compiler_4.3.2    globals_0.16.3    rstudioapi_0.16.0
[19] tools_4.3.2       listenv_0.9.1     pillar_1.9.0     
[22] rlang_1.1.4

I think the nested map is not the main culprit for me. It's when I put the future call into a function call that I really run into this issue where the cpu's never really get going, but the memory keeps ever increasing. In fact, I run out of 32GB of memory before the code is close to finishing. I have been able to consistently reproduce this across three different machines (Windows, Windows Server, Docker container running Ubuntu via WSL). Any ideas? Or anything I should look into? Thanks!