Verification that Parallel Compute works on Windows?

Question

Verification that Parallel Compute works on Windows?

Closed this issue 2 months ago · 14 comments

Apologies for this not containing a reprex on initial reporting.

Has it been verified that the proposed use of the 'future' package and the 'comparisons' function actually results in parallel computing in a Windows 11 environment? Based on both the runtime and reviewing CPU utilization, I see no evidence that parallel compute is actually happening? below is some sample code and I can make a reprex if it is helpful, I am just checking to see if others have actually verified that it "works" at a basic level.

library(gamlss)
library(marginaleffects)
library(future)

plan(cluster, workers=3) #also attempted this with 'multisession, but not real difference
options("marginaleffects_parallel" = TRUE)

xx <- comparisons(gamlss_model, newdata = datagrid(age= 40, mile_num= c(1,3,6), sbj=0),
variables = 'sex', what= 'mu', vcov = 'HAC')

For what it's worth, I've tried doing this "manually" with a foreach loop but it errors, I think because of a 'gamlss' package issue. I haven't chased that rabbit down the hole yet.

Answer 1 · 2024-09-30T13:26:03.000Z

No, I cannot test on windows.

But note that parallelism is applied over parameters. So if your model has very few coefficients, you wont' see much (if any) difference. Also, parallelism is very hard in this context because the data-passing overhead is high. I have personally not seen that much benefit, except in specific cases, for models with very many parameters.

Answer 2 · 2024-09-30T13:32:08.000Z

I see. Thanks for the response. As a quick follow-on, is the parallelism applied over "parameters" or "coefficients"? My thought is that a model with splines may only have a few parameters but many coefficients.

Just as an FYI, I don't think is parallelizing in a Windows environment. If you have a suggestion on verifying this assessment more analytically, I'm happy to do it for you since you can't test in Windows.

Answer 3 · 2024-09-30T13:35:36.000Z

Sorry: coefficients.

You can extract the jacobian with:

k = comparisons(model)
attr(k, "jacobian")

And the parallelized loop is over the columns of that.

Not super "analytical", but you can perhaps try the example here: https://marginaleffects.com/vignettes/performance.html#parallel-computation

Answer 4 · 2024-09-30T13:55:46.000Z

Thanks. Will report back, hopefully later this week.

Answer 5 · 2024-09-30T16:19:15.000Z

OK, yes. Confirmed that parallel compute does not work on Windows 11 system. See Reprex based on your example:

library(mgcv)
library(tictoc)
library(future)
library(nycflights13)
library(marginaleffects)
data("flights")
packageVersion("marginaleffects")
#> [1] '0.22.0.0'

cores <- 8
plan(multisession, workers = cores) #similar result with 'cluster' argument

flights <- flights |>
transform(date = as.Date(paste(year, month, day, sep = "/"))) |>
transform(date.num = as.numeric(date - min(date))) |>
transform(wday = as.POSIXlt(date)$wday) |>
transform(time = as.POSIXct(paste(hour, minute, sep = ":"), format = "%H:%M")) |>
transform(time.dt = difftime(time, as.POSIXct('00:00', format = '%H:%M'), units = 'min')) |>
transform(time.num = as.numeric(time.dt)) |>
transform(dep_delay = ifelse(dep_delay < 0, 0, dep_delay)) |>
transform(dep_delay = ifelse(is.na(dep_delay), 0, dep_delay)) |>
transform(carrier = factor(carrier)) |>
transform(dest = factor(dest)) |>
transform(origin = factor(origin))

model <- bam(dep_delay ~ s(date.num, bs = "cr") +
s(wday, bs = "cc", k = 3) +
s(time.num, bs = "cr") +
s(carrier, bs = "re") +
origin +
s(distance, bs = "cr") +
s(dest, bs = "re"),
data = flights,
family = poisson,
discrete = TRUE,
nthreads = cores)

tic()
p1 <- predictions(model, vcov = FALSE)
toc()
#> 0.37 sec elapsed

options("marginaleffects_parallel" = TRUE)

tic()
p1 <- predictions(model)
toc()
#> 46.69 sec elapsed
#>

options("marginaleffects_parallel" = FALSE)

tic()
p2 <- predictions(model)
toc()
#> 48.08 sec elapsed

Answer 6 · 2024-09-30T16:23:03.000Z

Does the future.apply package work as expected on your machine? For other tasks, I mean.

Answer 7 · 2024-09-30T16:42:55.000Z

Just to provide additional information. I've confirmed that above reprex for 'marginaleffects' on two different Windows 11 machines, one Institutionally managed and one a personal computer... so it seems unlikely to be machine-specific for Windows 11.

I've now confirmed on one of those machines (the institutionally managed one) that the 'future.apply' package appears to be working as desired, based on runtime at least. See below code:

#now just seeing if future.apply works

library(boot)
library(tictoc)

run1 <- function(...) {
cd4.rg <- function(data, mle) MASS::mvrnorm(nrow(data), mle$m, mle$v)
cd4.mle <- list(m = colMeans(cd4), v = var(cd4))
boot(cd4, corr, R = 250000, sim = "parametric",
ran.gen = cd4.rg, mle = cd4.mle)
}

tic()
set.seed(123)
cd4.boot <- do.call(c, lapply(1:4, run1))
boot.ci(cd4.boot, type = c("norm", "basic", "perc"),
conf = 0.9, h = atanh, hinv = tanh)
toc()

53.74 sec elapsed

library(future.apply)
plan(multisession, workers = 4) ## Parallelize using four cores

run1 <- function(...) {
cd4.rg <- function(data, mle) MASS::mvrnorm(nrow(data), mle$m, mle$v)
cd4.mle <- list(m = colMeans(cd4), v = var(cd4))
boot(cd4, corr, R = 250000, sim = "parametric",
ran.gen = cd4.rg, mle = cd4.mle)
}

tic()
set.seed(123)
cd4.boot <- do.call(c, future_lapply(1:4, run1, future.seed = TRUE))
boot.ci(cd4.boot, type = c("norm", "basic", "perc"),
conf = 0.9, h = atanh, hinv = tanh)
toc()

15.67 sec elapsed

Answer 8 · 2024-09-30T18:31:09.000Z

Thanks for the investigation, I appreciated.

Frankly, I'm stumped. As far as I know, there is no Windows-specific code whatsoever in the parallelization implementation.

See here: https://github.com/vincentarelbundock/marginaleffects/blob/main/R/get_jacobian.R#L49

Answer 9 · 2024-09-30T19:23:53.000Z

Alright, I don't have the bandwidth to diagnose right now, but I'll put it out on social media and see if others' have some time. Once I can look deeper into both 'future' handling and your underlying code, I'll let you know if I can provide anything that seems constructive. It would be good to get this sorted because I know a lot of people use your package due to its ease-of-use and I'm sure a non-nominal amount of them are on Windows machines.

Answer 10 · 2024-10-02T03:09:10.000Z

Hi all,

Initially I was going to confirm that I was getting the same issue on my windows machine. However, after a fresh restart of R I wasn't able to replicate the issue anymore, checking the task manager I could see extra r studio sessions created by plan(multisession) and that they were being utilised when using options("marginaleffects_parallel" = TRUE).

See below my timings from your repex.
packageVersion("future")
[1] ‘1.33.2’
packageVersion("marginaleffects")
[1] ‘0.22.0’

tic()
p1 <- predictions(model, vcov = FALSE)
toc()
0.53 sec elapsed

Three runs of options("marginaleffects_parallel" = TRUE) with
cores <- 8
plan(multisession, workers = cores)
tic()
p1<- predictions(model)
toc()
gave me 38.41, 28.67 and 28.78.

Three runs of options("marginaleffects_parallel" = FALSE)
tic()
p2 <- predictions(model)
toc()
gave me 62.04, 61.05 and 60.75

While not massive, the speed-ups could be worth exploring depending on the use case

One thing of note is after my fresh restart I also ran
rm(list=ls())
gc()
Which I have no clue whether it made a difference or not and obviously i don't want to suggest doing it as it will impact your working enviroment but just something of note.

When it initially wasn;t working on my first attempt I did try the below which seemed to work with a smaller speed up. However, my only concern with the approach below is that identical(p1$estimate, results$estimate) returns false all be it with very small differences.

library(foreach)
library(doParallel)
library(marginaleffects)
library(dplyr)
cores <- detectCores() - 6 # I found less cores/threads worked faster on my machine than cores-1.
cl <- makeCluster(cores)

Split the flights data into chunks
chunks <- split(flights, cut(seq(nrow(flights)), cores, labels = FALSE))

predict_chunk <- function(chunk, model) {
predictions(model, newdata = chunk)
}

tic()
results <- foreach(chunk = chunks, .combine = rbind,
.packages = c("marginaleffects","mgcv")) %dopar% {
predict_chunk(chunk, model)
}
toc()
46.43 sec elapsed

Stop the cluster
stopCluster(cl)

Answer 11 · 2024-10-02T13:27:32.000Z

library(mgcv) library(tictoc) library(future) library(nycflights13) library(marginaleffects) data("flights") packageVersion("marginaleffects") #> [1] '0.22.0.0'

cores <- 8 plan(multisession, workers = cores) #similar result with 'cluster' argument

flights <- flights |> transform(date = as.Date(paste(year, month, day, sep = "/"))) |> transform(date.num = as.numeric(date - min(date))) |> transform(wday = as.POSIXlt(date)$wday) |> transform(time = as.POSIXct(paste(hour, minute, sep = ":"), format = "%H:%M")) |> transform(time.dt = difftime(time, as.POSIXct('00:00', format = '%H:%M'), units = 'min')) |> transform(time.num = as.numeric(time.dt)) |> transform(dep_delay = ifelse(dep_delay < 0, 0, dep_delay)) |> transform(dep_delay = ifelse(is.na(dep_delay), 0, dep_delay)) |> transform(carrier = factor(carrier)) |> transform(dest = factor(dest)) |> transform(origin = factor(origin))

model <- bam(dep_delay ~ s(date.num, bs = "cr") + s(wday, bs = "cc", k = 3) + s(time.num, bs = "cr") + s(carrier, bs = "re") + origin + s(distance, bs = "cr") + s(dest, bs = "re"), data = flights, family = poisson, discrete = TRUE, nthreads = cores)

tic() p1 <- predictions(model, vcov = FALSE) toc() #> 0.37 sec elapsed

options("marginaleffects_parallel" = TRUE)

tic() p1 <- predictions(model) toc() #> 46.69 sec elapsed #>

options("marginaleffects_parallel" = FALSE)

tic() p2 <- predictions(model) toc() #> 48.08 sec elapsed

Thanks for running this and testing on your end. I think I've partially sorted out what may be going on. There appears to be some sort of multi-interaction between package versions, R version and global environments. I also ran everything on a 'clean' restart of both Windows 11, in R version 4.4.1, 'marginaleffects' version 0.22.0.0 and 'future' version 1.34.0.

I think the 'future' version is important here (I say without looking at that package's changelogs). Under 'future' version 1.33.0, the 'mgcv' predictions clearly are NOT parallelizing but are running serially without warnings or errors being thrown.

When I run that same code under 'future' version 1.34.0, the reprex actually errors and will not run, providing the following error:
Error in getGlobalsAndPackages(expr, envir = envir, globals = globals) :
The total size of the 12 globals exported for future expression (‘FUN()’) is 776.54 MiB.. This exceeds the maximum allowed size of 500.00 MiB (option 'future.globals.maxSize'). The three largest globals are ‘FUN’ (352.70 MiB of class ‘function’), ‘func’ (350.11 MiB of class ‘function’) and ‘model’ (39.03 MiB of class ‘list’)

This is a global setting in R that can be adjusted:
futureverse/future#185

and after applying the following line of code, I was able to achieve a modest increase in speed that @R2mu also observed and the Task Manager indicated it was computing in parallel:

options(future.globals.maxSize = 8000 * 1024^2)

All that to say, it's challenging when relying on other packages that are also actively being developed. It appears as though older version of 'future' default to serial computation when exceeding the export size and later versions error instead.
@vincentarelbundock is it worth trying to catch the error in your code and provide a more informative error or to require a higher version of 'future' be used?

Answer 12 · 2024-10-02T15:25:06.000Z

Thanks, this is interesting.

Frankly, I'm not sure I want to catch that error. The error message seems pretty readable and straightforward to me. If people get it, they've explicitly requested parallelism, so they should know what this is about. Moreover, I often feel that packages that catch error messages to rephrase them often end up complicating and obscuring more than helping.

Also, future is not a required/imported package, so I can't enforce a version.

Answer 13 · 2024-10-02T15:30:16.000Z

@vincentarelbundock That all makes sense to me.

I think this only becomes an issue for those running an older version of the 'future' package, as I agree that their message makes sense. It seems odd that the earlier version silently does serial computation instead of throwing an error. I'll close this issue, but hopefully this was informative. Feel free to ping me in the future if there are other apparent Windows-only issues. Cheers!

Answer 14 · 2024-10-02T15:32:53.000Z

Yes, very informative. Thanks a lot!