global r6 instances are modified and unusable in multisession
Closed this issue · 7 comments
when a R6 instance, from a R6 class defined in a source package (i.e. loaded using devtools::load_all()
) is passed using future.globals
, the instance is modified in a way where the parent environment of methods is no longer the namespace they are defined in.
Reproducible example
library(usethis)
create_package('r6bug') # N.B: change the wd
writeLines('MyR6 <- R6::R6Class("MyR6")', 'R/r6.R')
PKG_PATH <- normalizePath('.')
devtools::load_all(PKG_PATH, export_all = TRUE)
fun <- function(x) {
devtools::load_all(PKG_PATH, export_all = TRUE)
ns <- parent.env(environment(my_r6$clone))
if (isNamespace(ns)) getNamespaceName(ns) else NA
}
my_r6 <- MyR6$new()
getNamespaceName(parent.env(environment(my_r6$clone)))
library(future)
library(future.apply)
plan(sequential)
future_lapply(1:2, fun, future.globals = list(my_r6 = my_r6, PKG_PATH = PKG_PATH), future.packages = 'devtools')
# ℹ Loading r6bug
# ℹ Loading r6bug
# [[1]]
# name
# "r6bug"
# [[2]]
# name
# "r6bug"
plan(multicore)
future_lapply(1:2, fun, future.globals = list(my_r6 = my_r6, PKG_PATH = PKG_PATH), future.packages = 'devtools')
# ℹ Loading r6bug
# ℹ Loading r6bug
# [[1]]
# name
# "r6bug"
# [[2]]
# name
# "r6bug"
plan(multisession)
future_lapply(1:2, fun, future.globals = list(my_r6 = my_r6, PKG_PATH = PKG_PATH), future.packages = 'devtools')
# ℹ Loading r6bug
# ℹ Loading r6bug
# [[1]]
# [1] NA
# [[2]]
# [1] NA
Expected behavior
parent.env(environment(my_r6$clone))
in multisession should be the "r6bug" namespace
work-around
Currently, I backup the prior correct methods namespace, and attach it as an attribute, then restore it in the future_lappy FUN
function.
Session information
> sessionInfo()
R version 4.1.3 (2022-03-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.5 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] r6bug_0.0.0.9000 devtools_2.4.3 future.apply_1.8.1 future_1.23.0
[5] usethis_2.1.5
loaded via a namespace (and not attached):
[1] pillar_1.7.0 compiler_4.1.3 remotes_2.4.2 prettyunits_1.1.1
[5] tools_4.1.3 digest_0.6.29 testthat_3.1.3 pkgbuild_1.3.1
[9] pkgload_1.2.4 jsonlite_1.8.0 memoise_2.0.1 lifecycle_1.0.1
[13] tibble_3.1.6 pkgconfig_2.0.3 rlang_1.0.2 cli_3.2.0
[17] rstudioapi_0.13 parallel_4.1.3 xfun_0.30 fastmap_1.1.0
[21] withr_2.5.0 stringr_1.4.0 roxygen2_7.1.2 xml2_1.3.3
[25] knitr_1.38 globals_0.14.0 desc_1.4.1 fs_1.5.2
[29] vctrs_0.4.1 rprojroot_2.0.3 glue_1.6.2 listenv_0.8.0
[33] R6_2.5.1 processx_3.5.3 parallelly_1.30.0 fansi_1.0.3
[37] sessioninfo_1.2.2 purrr_0.3.4 callr_3.7.0 magrittr_2.0.3
[41] whisker_0.4 codetools_0.2-18 ps_1.6.0 ellipsis_0.3.2
[45] utf8_1.2.2 stringi_1.7.6 cachem_1.0.6 crayon_1.5.1
[49] brio_1.1.3
I don't understand the root issues, but I do remember that I've had issues with 'multisession' and devtools::load_all()
in the past, and I think it's due to the way that devtools::load_all()
emulates attaching an installed package (that is not really installed). Thus, I can't comment on the underlying behavior, and you mention you already have a workaround, but here is a different workaround in case it helps.
It relies on devtools::dev_mode()
and remotes::install_local()
. I had to make one change to the body of your code which is to change your call to MyR6
to be r6bug:::MyR6
, since there is no argument to install_local()
like export_all
.
Below is the full example, and all plans give the same output for me.
library(usethis)
create_package('r6bug') # N.B: change the wd
writeLines('MyR6 <- R6::R6Class("MyR6")', 'R/r6.R')
PKG_PATH <- normalizePath('.')
#devtools::load_all(PKG_PATH, export_all = TRUE)
temp_d <- tempfile()
dir.create(temp_d)
devtools::dev_mode(on = TRUE, path = temp_d)
remotes::install_local(PKG_PATH, upgrade = "never")
library("r6bug")
fun <- function(x) {
devtools::load_all(PKG_PATH, export_all = TRUE)
ns <- parent.env(environment(my_r6$clone))
if (isNamespace(ns)) getNamespaceName(ns) else NA
}
my_r6 <- r6bug:::MyR6$new()
getNamespaceName(parent.env(environment(my_r6$clone)))
library(future)
library(future.apply)
plan(sequential)
future_lapply(1:2, fun, future.globals = list(my_r6 = my_r6, PKG_PATH = PKG_PATH), future.packages = 'devtools')
plan(multicore)
future_lapply(1:2, fun, future.globals = list(my_r6 = my_r6, PKG_PATH = PKG_PATH), future.packages = 'devtools')
plan(multisession)
future_lapply(1:2, fun, future.globals = list(my_r6 = my_r6, PKG_PATH = PKG_PATH), future.packages = 'devtools')
Hi @scottkosty , and thanks for your quick reply.
Your work-around is not really applicable for me. Here it is a reprex, but as you can imagine my real problem is a bit more complex, and involves a dozen of source packages. It will take more time to install them than to actually do the computations in sequential mode.
@kforner Makes sense!
Hi. There's no need to involved future.apply for a minimal example. Here's the same with just future:
opwd <- getwd()
usethis::create_package("r6bug") # N.B: change the wd
writeLines("MyR6 <- R6::R6Class('MyR6')", "R/r6.R")
PKG_PATH <- normalizePath(".")
devtools::load_all(PKG_PATH, export_all = TRUE)
fun <- function(x) {
devtools::load_all(PKG_PATH, export_all = TRUE)
parent.env(environment(my_r6$clone))
}
my_r6 <- MyR6$new()
penv <- parent.env(environment(my_r6$clone))
print(penv)
## <environment: namespace:r6bug>
print(getNamespaceName(penv))
## name
## "r6bug"
library(future)
globals <- list(fun = fun, my_r6 = my_r6, PKG_PATH = PKG_PATH)
packages <- "devtools"
plan(sequential)
f <- future(fun(1), globals = globals, packages = packages)
v <- value(f)
print(v)
## <environment: namespace:r6bug>
plan(multisession, workers = I(1))
f <- future(fun(1), globals = globals, packages = packages)
v <- value(f)
print(v)
## <environment: R_GlobalEnv>
-
Before anything else, are you using
devtools::load_all()
here to create a reproducible example, or do you actually use that in "production"? Have you checked the behavior whenMyR6
lives in a proper R package? -
FWIW, reconstructing environment hierarchies on parallel R processes are really hard, especially if one wants to cover all cases. It basically requires emulating how the R run-time does it. So, this might not be an issue if you use a proper R package that is available to all parallel R processes.
PS. Next, I'll migrate this issue over to the future issue tracker.
Hi Henrik
-
Yes, I use (indirectly)
devtools::load_all()
in production. It's not just for the reprex.- I have not checked in a proper R package, but I suspect it should work. I understand it is a corner case.
-
I understand that. The thing is, I can not pass the related packages using
future.packages=
since there are not loaded in a "standard" way. If it were possible somehow to "customize" the loading offuture.packages=
to sub-processes prior to sending/serializing the globals, I suppose it would work. -
I can work-around that issue. I posted it in case you can see a way to generally improve the code.
Thanks for all your work. Much appreciated.
I see. No, I think you need to reconstruct the environment hierarchy yourself. The "namespace" environment created by devtools::load_all(PKG_PATH, export_all = TRUE)
in the main R session is different from that in the parallel worker. The only thing that would hint that they are related is the name attribute, but it's basically only you as a developer that know they're the same. In contrast, with proper R packages, we make assumptions that a package namespace in the main R session is the same as the package namespace in a parallel worker if they share the same package name.
It would require almost mind-reading skills to know that the parent environment of object my_r6
that was created in the main R session, should be bound to a similar "namespace" environment on the parallel worker.
Regarding:
I use (indirectly)
devtools::load_all()
in production.
My only remaining recommendation is to take a deep dive into that and see you if instead could at least build a proper package on-the-fly. R doesn't have in-memory package building/namespace creation and devtools::load_all()
is really a hack that I doubt can fully emulate the real scenario.
Thank you. I'll think about it.