wlandau/crew

crew worker does not actually heed `reset_packages = FALSE`, `reset_globals = FALSE` etc

Closed this issue · 5 comments

Hi crew devs,
I think I've encountered a bug based on what the documentation suggests for reset_packages & reset_globals arguments to crew_controller_local. Reprexes below

Prework

  • Read and agree to the Contributor Code of Conduct and contributing guidelines.
  • Confirm that your issue is a genuine bug in the crew package itself and not a user error, known limitation, or issue from another package that crew depends on. For example, if you get errors running tar_make_clustermq(), try isolating the problem in a reproducible example that runs clustermq and not crew. And for miscellaneous troubleshooting, please post to discussions instead of issues.
  • If there is already a relevant issue, whether open or closed, comment on the existing thread instead of posting a new issue.
  • Post a minimal reproducible example like this one so the maintainer can troubleshoot the problems you identify. A reproducible example is:
    • Runnable: post enough R code and data so any onlooker can create the error on their own computer.
    • Minimal: reduce runtime wherever possible and remove complicated details that are irrelevant to the issue at hand.
    • Readable: format your code according to the tidyverse style guide.

Description

Every task allocated to the worker resets the packages and globals, there is nothing that persists between tasks pushed to the worker.

Reproducible example for globals

# Start the crew
crew <-
  crew::crew_controller_local(
    "user",
    seconds_idle = 10,,
    local_log_directory = "crew_logs",
    reset_globals = FALSE
  )
crew$start()

crew$push(
  name = "add_the_data",
  {
    the_data
    
  },
  globals = list(the_data = "test")
)
# Pop till this completes
a <- crew$pop()

crew$push(
  name = "verify_the_data",
  the_data
)
# Pop till this completes
a <- crew$pop()

crew$push(
  name = "verify_the_data_again",
  the_data
)
# Pop till this completes
a <- crew$pop()

Expected result

the_data persists as a global on the worker between subsequent tasks allocated to the worker.
However, though it will occasionally persist on one subsequent task verify_the_data, by the final task: verify_the_data_again The result will be NA because the global has disappeared.

The same is true for packages.

Reprex for packages

usethis::create_package("../testPackage", open = FALSE)
code <- "my_env <- new.env()
assign_var <- function(value) {
  if (!missing(value))
    my_env$obj <- value
  return(my_env$obj)
}"
write(code, "../testPackage/R/test.R")
devtools::build("../testPackage", "test_package.tar.gz")
devtools::install_local("test_package.tar.gz")
crew <-
  crew::crew_controller_local(
    "user",
    seconds_idle = 10,,
    local_log_directory = "crew_logs",
    reset_globals = FALSE
  )
crew$start()

crew$push(
  name = "assign_value",
  {
    testPackage:::assign_var(value = value)
  },
  data = list(value = "test")
)
# Pop till this completes
a <- crew$pop()

crew$push(
  name = "verify_the_value",
  testPackage:::assign_var()
)
# Pop till this completes
a <- crew$pop()
crew$push(
  name = "verify_the_value_again",
  testPackage:::assign_var()
)
# Pop till this completes
a <- crew$pop()

Expected result

The variable assigned to the value in the internal environment of the package persists between tasks at least once, but it returns NULL by the second task.

Diagnostic information

Diagnostic information
─ Session info ───────────────────────────────────────────────────────────────
 setting  value
 version  R version 4.3.0 (2023-04-21)
 os       macOS 14.2.1
 system   aarch64, darwin20
 ui       RStudio
 language (EN)
 collate  en_US.UTF-8
 ctype    en_US.UTF-8
 tz       America/New_York
 date     2024-01-25
 rstudio  2023.12.0+369 Ocean Storm (desktop)
 pandoc   NA

─ Packages ───────────────────────────────────────────────────────────────────
 !  package     * version    date (UTC) lib source
 P  attempt       0.3.1      2020-05-03 [?] CRAN (R 4.3.0)
    cachem        1.0.8      2023-05-01 [1] CRAN (R 4.3.0)
    callr         3.7.3      2022-11-02 [1] CRAN (R 4.3.0)
 VP cli           3.6.1      2023-12-11 [?] CRAN (R 4.3.1) (on disk 3.6.2)
 P  config        0.3.1      2020-12-17 [?] CRAN (R 4.3.0)
 P  crayon        1.5.2      2022-09-29 [?] CRAN (R 4.3.0)
    crew          0.8.0.9002 2024-01-25 [1] Github (wlandau/crew@f31eea1)
 P  data.table    1.14.10    2023-12-08 [?] CRAN (R 4.3.1)
    devtools      2.4.5      2022-10-11 [1] CRAN (R 4.3.0)
 P  digest        0.6.33     2023-07-07 [?] CRAN (R 4.3.0)
    DMDUAPI       2.7.0      2024-01-25 [1] local (/Users/stephenholsenbeck/Documents/R/Contributor_Repos/VirgaLabs/dmdu/dmduapi.tar.gz)
    ellipsis      0.3.2      2021-04-29 [1] CRAN (R 4.3.0)
 VP fansi         1.0.4      2023-12-08 [?] CRAN (R 4.3.1) (on disk 1.0.6)
    fastmap       1.1.1      2023-02-24 [1] CRAN (R 4.3.0)
 P  fs            1.6.3      2023-07-20 [?] CRAN (R 4.3.0)
 P  getip         0.1-4      2023-12-10 [?] CRAN (R 4.3.1)
 VP glue          1.6.2      2024-01-09 [?] CRAN (R 4.3.1) (on disk 1.7.0)
 P  golem         0.4.1      2023-06-05 [?] CRAN (R 4.3.0)
 P  htmltools     0.5.7      2023-11-03 [?] CRAN (R 4.3.1)
    htmlwidgets   1.6.2      2023-03-17 [1] CRAN (R 4.3.0)
 P  httpuv        1.6.12     2023-10-23 [?] CRAN (R 4.3.1)
    later         1.3.1      2023-05-02 [1] CRAN (R 4.3.0)
 VP lifecycle     1.0.3      2023-11-07 [?] CRAN (R 4.3.1) (on disk 1.0.4)
 P  magrittr      2.0.3      2022-03-30 [?] CRAN (R 4.3.0)
    memoise       2.0.1      2021-11-26 [1] CRAN (R 4.3.0)
    mime          0.12       2021-09-28 [1] CRAN (R 4.3.0)
    miniUI        0.1.1.1    2018-05-18 [1] CRAN (R 4.3.0)
 P  mirai         0.12.0     2024-01-12 [?] CRAN (R 4.3.1)
 P  nanonext      0.12.0     2024-01-10 [?] CRAN (R 4.3.1)
 P  pillar        1.9.0      2023-03-22 [?] CRAN (R 4.3.0)
 P  pkgbuild      1.4.2      2023-06-26 [?] CRAN (R 4.3.0)
 P  pkgconfig     2.0.3      2019-09-22 [?] CRAN (R 4.3.0)
 P  pkgload       1.3.3      2023-09-22 [?] CRAN (R 4.3.1)
 VP plyr          1.8.8      2023-10-02 [?] CRAN (R 4.3.1) (on disk 1.8.9)
    prettyunits   1.2.0      2023-09-24 [1] CRAN (R 4.3.1)
 P  processx      3.8.3      2023-12-10 [?] CRAN (R 4.3.1)
 P  profvis       0.3.8      2023-05-02 [?] CRAN (R 4.3.0)
    promises      1.2.1      2023-08-10 [1] CRAN (R 4.3.0)
 P  ps            1.7.6      2024-01-18 [?] CRAN (R 4.3.1)
 P  purrr         1.0.2      2023-08-10 [?] CRAN (R 4.3.0)
 P  R6            2.5.1      2021-08-19 [?] CRAN (R 4.3.0)
 P  Rcpp          1.0.11     2023-07-06 [?] CRAN (R 4.3.0)
 P  remotes       2.4.2.1    2023-07-18 [?] CRAN (R 4.3.0)
 P  renv          1.0.3      2023-09-19 [?] CRAN (R 4.3.1)
 VP rlang         1.1.1      2024-01-10 [?] CRAN (R 4.3.1) (on disk 1.1.3)
 P  rstudioapi    0.15.0     2023-07-07 [?] CRAN (R 4.3.0)
    sessioninfo   1.2.2      2021-12-06 [1] CRAN (R 4.3.0)
 P  shiny         1.8.0      2023-11-17 [?] CRAN (R 4.3.1)
 VP stringi       1.7.12     2023-12-11 [?] CRAN (R 4.3.1) (on disk 1.8.3)
 VP stringr       1.5.0      2023-11-14 [?] CRAN (R 4.3.1) (on disk 1.5.1)
 P  tibble        3.2.1      2023-03-20 [?] CRAN (R 4.3.0)
    tidyselect    1.2.0      2022-10-10 [1] CRAN (R 4.3.0)
    urlchecker    1.0.1      2021-11-30 [1] CRAN (R 4.3.0)
    usethis       2.2.2      2023-07-06 [1] CRAN (R 4.3.0)
 VP utf8          1.2.3      2023-10-22 [?] CRAN (R 4.3.1) (on disk 1.2.4)
    UU            1.48.0     2024-01-22 [1] Github (yogat3ch/UU@963e49e)
 P  vctrs         0.6.5      2023-12-01 [?] CRAN (R 4.3.1)
 P  virgaUtils    0.5.0      2023-05-13 [?] Github (Martin-McCoy/virgaUtils@3dbd357)
    xtable        1.8-4      2019-04-21 [1] CRAN (R 4.3.0)
 P  yaml          2.3.7      2023-01-23 [?] CRAN (R 4.3.0)

 [1] /Users/stephenholsenbeck/Library/Caches/org.R-project.R/R/renv/library/dmdu-4393acb2/R-4.3/aarch64-apple-darwin20
 [2] /Users/stephenholsenbeck/Library/Caches/org.R-project.R/R/renv/sandbox/R-4.3/aarch64-apple-darwin20/ac5c2659

 V ── Loaded and on-disk version mismatch.
 P ── Loaded and on-disk path mismatch.

──────────────────────────────────────────────────────────────────────────────

GithubSHA: f31eea1 Latest

I think reset_globals is working properly. If you set reset_globals = FALSE, you may still see some globals go away if your worker exits and then relaunches. The aggressive timeout of seconds_idle = 10 may have caused this on your end. Here is what I see:

crew <- crew::crew_controller_local(
  name = "user",
  workers = 1,
  local_log_directory = "crew_logs",
  reset_globals = FALSE
)
crew$start()

crew$push(
  name = "add_the_data",
  command = the_data,
  globals = list(the_data = "test")
)

crew$wait()
a <- crew$pop()
a$result[[1]]
#> [1] "test"
a$error
#> [1] NA

crew$push(
  name = "verify_the_data",
  the_data
)
crew$wait()
b <- crew$pop()
b$result[[1]]
#> [1] "test"
b$error
#> [1] NA

crew$push(
  name = "verify_the_data_again",
  the_data
)
crew$wait()
c <- crew$pop()
c$result[[1]]
#> [1] "test"
c$error
#> [1] NA

Created on 2024-01-26 with reprex v2.1.0

And with reset_globals = TRUE, we rightly get the opposite result because the_data is cleaned up.

crew <- crew::crew_controller_local(
  name = "user",
  workers = 1,
  local_log_directory = "crew_logs",
  reset_globals = TRUE
)
crew$start()

crew$push(
  name = "add_the_data",
  command = the_data,
  globals = list(the_data = "test")
)

crew$wait()
a <- crew$pop()
a$result[[1]]
#> [1] "test"
a$error
#> [1] NA

crew$push(
  name = "verify_the_data",
  the_data
)
crew$wait()
b <- crew$pop()
b$result[[1]]
#> [1] NA
b$error
#> [1] "object 'the_data' not found"

crew$push(
  name = "verify_the_data_again",
  the_data
)
crew$wait()
c <- crew$pop()
c$result[[1]]
#> [1] NA
c$error
#> [1] "object 'the_data' not found"

Created on 2024-01-26 with reprex v2.1.0

The variable assigned to the value in the internal environment of the package persists between tasks at least once, but it returns NULL by the second task.

Neither crew nor its backend mirai can reset values hidden away inside package environments. To be safe in real-life workflows, each task should manually reset these niche variables, or the controller should impose tasks_max = 1 so each task gets a fresh worker.

I think reset_globals is working properly. If you set reset_globals = FALSE, you may still see some globals go away if your worker exits and then relaunches. The aggressive timeout of seconds_idle = 10 may have caused this on your end. Here is what I see:

crew <- crew::crew_controller_local(
  name = "user",
  workers = 1,
  local_log_directory = "crew_logs",
  reset_globals = FALSE
)
crew$start()

crew$push(
  name = "add_the_data",
  command = the_data,
  globals = list(the_data = "test")
)

crew$wait()
a <- crew$pop()
a$result[[1]]
#> [1] "test"
a$error
#> [1] NA

crew$push(
  name = "verify_the_data",
  the_data
)
crew$wait()
b <- crew$pop()
b$result[[1]]
#> [1] "test"
b$error
#> [1] NA

crew$push(
  name = "verify_the_data_again",
  the_data
)
crew$wait()
c <- crew$pop()
c$result[[1]]
#> [1] "test"
c$error
#> [1] NA

Created on 2024-01-26 with reprex v2.1.0

And with reset_globals = TRUE, we rightly get the opposite result because the_data is cleaned up.

crew <- crew::crew_controller_local(
  name = "user",
  workers = 1,
  local_log_directory = "crew_logs",
  reset_globals = TRUE
)
crew$start()

crew$push(
  name = "add_the_data",
  command = the_data,
  globals = list(the_data = "test")
)

crew$wait()
a <- crew$pop()
a$result[[1]]
#> [1] "test"
a$error
#> [1] NA

crew$push(
  name = "verify_the_data",
  the_data
)
crew$wait()
b <- crew$pop()
b$result[[1]]
#> [1] NA
b$error
#> [1] "object 'the_data' not found"

crew$push(
  name = "verify_the_data_again",
  the_data
)
crew$wait()
c <- crew$pop()
c$result[[1]]
#> [1] NA
c$error
#> [1] "object 'the_data' not found"

Created on 2024-01-26 with reprex v2.1.0

Hi @wlandau,
Thanks for the response and apologies for the delay, I was out on vacation for the entirety of February.
I'm not sure I explained the issue well enough.

Neither crew nor its backend mirai can reset values hidden away inside package environments. To be safe in real-life workflows, each task should manually reset these niche variables, or the controller should impose tasks_max = 1 so each task gets a fresh worker.

The reprex demonstrates how it is indeed resetting the value in the package though, possibly by reloading the package?

The code written to the package in the reprex above creates an environment when the testPackage package loads. There's one function in the package that assigns a variable to that environment and recalls the variable from the environment if no new variables are passed into the function.

In the first task, we pass a variable into the crew task which is then assigned by that function to the environment internal to the package.
In all but the second subsequent tasks, the variable assigned to that environment is inaccessible.

In your reprex, you pass the variable to each subsequent crew call, so of course that variable is available in each crew task.

If reset_packages is FALSE one would expect that the variable assigned to the package's internal environment in the first task would remain constant throughout all subsequent tasks, but the reprex demonstrates that that's not the case.

Does the issue make more sense now?

The reprex demonstrates how it is indeed resetting the value in the package though, possibly by reloading the package?

As written originally in #143 (comment), your test with packages also uses seconds_idle = 10. As explained in #143 (comment), this will cause the workers to shut themselves down if there is a 10-second period of idling with no tasks. When I removed seconds_idle = 10 in your packages test, every task returned "test" as expected.

The reprex demonstrates how it is indeed resetting the value in the package though, possibly by reloading the package?

As written originally in #143 (comment), your test with packages also uses seconds_idle = 10. As explained in #143 (comment), this will cause the workers to shut themselves down if there is a 10-second period of idling with no tasks. When I removed seconds_idle = 10 in your packages test, every task returned "test" as expected.

@wlandau Ahh, the worker is closing out 10s after every task and that's why the packages refresh when loaded each time. Got it, thank you!