Nested Futures Use More Memory Than They Should
Opened this issue · 2 comments
I've been running code with nested loops that keeps running into issues with memory usage and I have been trying to come up with a small example that potentially shows the problem. In the example I am just taking a random square matrix and creating a list of the columns. Obviously you wouldn't use a double loop to do this in R but it is hopefully a simple and clear example that shows when using purrr
the double loop doesn't increase memory usage while with furrr
and future.apply
the memory usage explodes.
library(bench)
library(furrr)
library(future.apply)
library(purrr)
# purrr
single_loop <- function(x, n) {
map(1:n, ~ x[, .x])
}
# future.apply
single_loop_a <- function(x, n) {
future_lapply(1:n, FUN = function(i) x[, i])
}
# furrr
single_loop_f <- function(x, n) {
future_map(1:n, ~ x[, .x])
}
# purrr
inner_loop <- function(i, n, x = x) {
map_dbl(1:n, ~ x[.x, i])
}
outer_loop <- function(x, n) {
map(1:n, ~ inner_loop(.x, n, x = x))
}
# future.apply
inner_loop_a <- function(i, n, x = x) {
future_sapply(1:n, FUN = function(j) x[j, i])
}
outer_loop_a <- function(x, n) {
future_lapply(1:n, FUN = function(i) inner_loop_a(i, n, x))
}
# furrr
inner_loop_f <- function(i, n, x = x) {
future_map_dbl(1:n, ~ x[.x, i])
}
outer_loop_f <- function(x, n) {
future_map(1:n, ~ inner_loop_f(.x, n, x = x))
}
n <- 100
x <- matrix(rnorm(n * n), nrow = n)
identical(single_loop(x, n), single_loop_f(x, n))
identical(single_loop(x, n), single_loop_a(x, n))
identical(single_loop(x, n), outer_loop(x, n))
identical(single_loop(x, n), outer_loop_a(x, n))
identical(single_loop(x, n), outer_loop_f(x, n))
# All return TRUE
plan(sequential)
# With a single loop memory usage is similar
bench::mark(single_loop(x, n))$mem_alloc
# 127KB
bench::mark(single_loop_a(x, n))$mem_alloc
# 243KB
bench::mark(single_loop_f(x, n))$mem_alloc
# 340KB
# With a double loop memory usage remains similar for purrr, but explodes
# on the other two
bench::mark(outer_loop(x, n))$mem_alloc
# 83.6KB
bench::mark(outer_loop_a(x, n))$mem_alloc
# 11.8MB
bench::mark(outer_loop_f(x, n))$mem_alloc
# 21.1MB
# Try again with a larger matrix
n <- 5000
x <- matrix(rnorm(n * n), nrow = n)
bench::mark(single_loop(x, n))$mem_alloc
287MB
bench::mark(single_loop_a(x, n))$mem_alloc
287MB
bench::mark(single_loop_f(x, n))$mem_alloc
287MB
bench::mark(outer_loop(x, n))$mem_alloc
191MB
bench::mark(outer_loop_a(x, n))$mem_alloc
2.88GB
bench::mark(outer_loop_f(x, n))$mem_alloc
1.57GB
As you can see, using the double loop actually decreases memory usage for purrr
, although it stays very similar, but causes memory usage to explode for furrr
and future.apply
. I ran this example on a 2023 MacBook, but the actual code that I am trying to fix has been running on a Linux cluster. I ran this example using furrr
and future.apply
because yesterday I logged a bug report about nested loops using future.callr and @HenrikBengtsson pointed out that it was only an issue with furrr
. Please let me know if there is any additional information I can provide or help I can give in solving this issue and thanks for the wonderful collection of packages!
A little more information. I don't know much about memory profiling, so apologies if this is not the best way to present the information, but in the hopes it might be helpful...
library(profmem)
library(tidyverse)
n <- 100
x <- matrix(rnorm(n * n), nrow = n)
single_loop(x, n) |>
profmem() |>
filter(what == "alloc") |>
summarise(
allocs = n(),
total_bytes = sum(bytes)
)
# allocs total_bytes
# 1 201 130448
single_loop_a(x, n) |>
profmem() |>
filter(what == "alloc") |>
summarise(
allocs = n(),
total_bytes = sum(bytes)
)
# allocs total_bytes
# 1 364 251360
single_loop_f(x, n) |>
profmem() |>
filter(what == "alloc") |>
summarise(
allocs = n(),
total_bytes = sum(bytes)
)
# allocs total_bytes
# 1 475 353128
outer_loop(x, n) |>
profmem() |>
filter(what == "alloc") |>
summarise(
allocs = n(),
total_bytes = sum(bytes)
)
# allocs total_bytes
# 1 101 85648
outer_loop_a(x, n) |>
profmem() |>
filter(what == "alloc") |>
summarise(
allocs = n(),
total_bytes = sum(bytes)
)
# allocs total_bytes
# 1 17101 12623144
outer_loop_f(x, n) |>
profmem() |>
filter(what == "alloc") |>
summarise(
allocs = n(),
total_bytes = sum(bytes),
)
# allocs total_bytes
# 1 27812 22595240
Update: I fixed my problem. Turns out I was not familiar with how future handles environments. See https://furrr.futureverse.org/articles/gotchas.html and https://furrr.futureverse.org/articles/carrier.html
I think I ran into the same problem or at least a very similar problem. Apologies for the somewhat convoluted data reconstruction but it's a simulation of the data that I used when I first encountered it. Here's my reprex:
library(future)
library(furrr)
library(purrr)
logistic_model <- function(feature, df_other_vars, formula) {
df <- dplyr::bind_cols(df_other_vars, "x" = feature)
m <- glm(formula(formula),
data = df,
family = binomial(logit))
return(m)
}
nested_map <- function(imputed_versions_feature, ...) {
models <- imputed_versions_feature |>
purrr::map(\(imputed_version_feature)
logistic_model(feature = imputed_version_feature, ...))
return(models[1]) # originally mice::pool call, but not necessary for demonstration
}
gen_names <- function(n = 1) {
mz <- runif(min = 10, max = 200, n = n) |> signif(7)
rt <- runif(min = 0, max = 12, n = n) |> signif(7)
string <- glue::glue("X{mz}_{rt}")
return(string)
}
gen_x <- function(dummy, nr_imputations = 60, n = 1000) {
x <- replicate(nr_imputations, rnorm(n)) |> tibble::as_tibble()
}
list_of_feature_dfs <- gen_names(1024) |>
tibble::as_tibble() |>
tidyr::pivot_wider(names_from = value) |>
purrr::map(gen_x)
df <- tibble::tibble(y = rbinom(1000, 1, 0.5))
seed <- 1309
set.seed(seed)
furrr_options <- furrr::furrr_options(seed = seed)
future::plan(future::multisession, workers = 16)
# no problems
r <- list_of_feature_dfs |>
furrr::future_map(\(feature) nested_map(imputed_versions_feature = feature,
df_other_vars = df,
formula = 'y ~ x'),
.progress = TRUE,
.options = furrr_options)
# same as above but via function call: cpu's never really get going, memory keeps ever increasing - doesn't finish
wrapper <- function(list_of_feature_dfs, formula, df_other_vars, furrr_options) {
results <- list_of_feature_dfs %>%
furrr::future_map(\(feature) nested_map(imputed_versions_feature = feature,
df_other_vars = df_other_vars,
formula = formula),
.progress = TRUE,
.options = furrr_options)
return(results)
}
r_function <- wrapper(list_of_feature_dfs, 'y ~ x', df, furrr_options = furrr_options)
Session info:
> sessionInfo()
R version 4.3.2 (2023-10-31 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows Server 2019 x64 (build 17763)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
time zone: Etc/UTC
tzcode source: internal
attached base packages:
[1] stats graphics grDevices utils datasets methods
[7] base
other attached packages:
[1] tibble_3.2.1 tidyr_1.3.1 dplyr_1.1.4 purrr_1.0.2
[5] furrr_0.3.1 future_1.34.0
loaded via a namespace (and not attached):
[1] digest_0.6.37 utf8_1.2.4 R6_2.5.1
[4] codetools_0.2-19 tidyselect_1.2.1 magrittr_2.0.3
[7] glue_1.8.0 parallel_4.3.2 pkgconfig_2.0.3
[10] generics_0.1.3 lifecycle_1.0.4 cli_3.6.3
[13] fansi_1.0.6 parallelly_1.38.0 vctrs_0.6.5
[16] compiler_4.3.2 globals_0.16.3 rstudioapi_0.16.0
[19] tools_4.3.2 listenv_0.9.1 pillar_1.9.0
[22] rlang_1.1.4
I think the nested map is not the main culprit for me. It's when I put the future call into a function call that I really run into this issue where the cpu's never really get going, but the memory keeps ever increasing. In fact, I run out of 32GB of memory before the code is close to finishing. I have been able to consistently reproduce this across three different machines (Windows, Windows Server, Docker container running Ubuntu via WSL). Any ideas? Or anything I should look into? Thanks!