lindeloev/job

C stack error on MacOS

fusaroli opened this issue · 32 comments

I'm running some large models and keep getting the error:
"Error: C stack usage 7969264 is too close to the limit", with a suggestion of only passing a smaller part of the environment to the job. However, I cannot find an example of how to do that.
Maybe add a short example to the readme/vignette?

I just merged a bunch of commits where I basically re-wrote the whole thing, including much better memory efficiency. Could you try installing the latest version and see if that fixes it out-of-the-box? remotes::install_github("lindeloev/job")

You are right that I really need to write worked examples of important arguments and cases! Tracking that in #24.

You're looking for the import argument (job::job({<code}, import = c(var1, var2, ...)). You can also do import = "auto" as an automagic solution, which imports everything from globalenv() that is mentioned in the code (via all.vars()).

still getting this

Error: C stack usage 7969264 is too close to the limit
but not the warning I should use import (which I saw in the penultimate version, I think).
But as soon as I wrap up this analysis, I'll start fresh w a new session and re-try, in case there is some clutter from the previous run.

Oof. Could you send sessionInfo()? People exclusively report this error from macs. I need to install MacOS in a VM so I can do fast iterations myself (created #29 to track this). One desperate attempt is disabling the computation-of-import-size which uses recursion; just pushed it to the "cstack" branch:

 remotes::install_github("lindeloev/job@cstack")

Has like 20% chance of being a solution...

will check the new version later. For now:

sessionInfo()
R version 4.0.5 (2021-03-31)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Catalina 10.15.7

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] job_0.2 viridis_0.6.0 viridisLite_0.4.0 svglite_2.0.0 formattable_0.2.1
[6] scam_1.2-11 mgcv_1.8-35 nlme_3.1-152 ggstance_0.3.5 patchwork_1.1.1
[11] ggstatsplot_0.7.2 ggridges_0.5.3 ggbeeswarm_0.6.0 ggpubr_0.4.0 scales_1.1.1
[16] posterior_0.1.3 bayesplot_1.8.0 brms_2.15.0 Rcpp_1.0.6 here_1.0.1
[21] forcats_0.5.1 stringr_1.4.0 dplyr_1.0.5 purrr_0.3.4 readr_1.4.0
[26] tidyr_1.1.3 tibble_3.1.1 ggplot2_3.3.3 tidyverse_1.3.1

loaded via a namespace (and not attached):
[1] estimability_1.3 coda_0.19-4 knitr_1.33
[4] dygraphs_1.1.1.6 multcomp_1.4-17 data.table_1.14.0
[7] inline_0.3.17 generics_0.1.0 callr_3.7.0
[10] TH.data_1.0-10 correlation_0.6.1 xml2_1.3.2
[13] lubridate_1.7.10 httpuv_1.6.0 StanHeaders_2.21.0-7
[16] assertthat_0.2.1 WRS2_1.1-1 xfun_0.22
[19] hms_1.0.0 rethinking_2.13 evaluate_0.14
[22] promises_1.2.0.1 fansi_0.4.2 dbplyr_2.1.1
[25] readxl_1.3.1 igraph_1.2.6 DBI_1.1.1
[28] htmlwidgets_1.5.3 reshape_0.8.8 kSamples_1.2-9
[31] stats4_4.0.5 Rmpfr_0.8-4 paletteer_1.3.0
[34] ellipsis_0.3.2 crosstalk_1.1.1 backports_1.2.1
[37] V8_3.4.2 insight_0.13.2 markdown_1.1
[40] ggcorrplot_0.1.3 RcppParallel_5.1.3 libcoin_1.0-8
[43] vctrs_0.3.8 remotes_2.3.0 cmdstanr_0.3.0.9000
[46] abind_1.4-5 cachem_1.0.4 withr_2.4.2
[49] metaBMA_0.6.7 checkmate_2.0.0 emmeans_1.6.0
[52] xts_0.12.1 prettyunits_1.1.1 pacman_0.5.1
[55] crayon_1.4.1 pkgconfig_2.0.3 SuppDists_1.1-9.5
[58] labeling_0.4.2 vipor_0.4.5 statsExpressions_1.0.1
[61] rlang_0.4.11 lifecycle_1.0.0 miniUI_0.1.1.1
[64] LaplacesDemon_16.1.4 colourpicker_1.1.0 MatrixModels_0.5-0
[67] sandwich_3.0-0 modelr_0.1.8 cellranger_1.1.0
[70] rprojroot_2.0.2 matrixStats_0.58.0 Matrix_1.3-2
[73] loo_2.4.1 mc2d_0.1-19 carData_3.0-4
[76] boot_1.3-27 zoo_1.8-9 reprex_2.0.0
[79] base64enc_0.1-3 beeswarm_0.3.1 gamm4_0.2-6
[82] processx_3.5.2 PMCMRplus_1.9.0 parameters_0.13.0
[85] shape_1.4.5 multcompView_0.1-8 coin_1.4-1
[88] shinystan_2.5.0 rstatix_0.7.0 ggsignif_0.6.1
[91] memoise_2.0.0 magrittr_2.0.1 plyr_1.8.6
[94] threejs_0.3.3 compiler_4.0.5 rstantools_2.1.1
[97] lme4_1.1-26 cli_2.5.0 pbapply_1.4-3
[100] ps_1.6.0 Brobdingnag_1.2-6 MASS_7.3-53.1
[103] tidyselect_1.1.1 stringi_1.5.3 projpred_2.0.2
[106] yaml_2.2.1 ggrepel_0.9.1 bridgesampling_1.1-2
[109] grid_4.0.5 tools_4.0.5 parallel_4.0.5
[112] rio_0.5.26 rstudioapi_0.13 foreign_0.8-81
[115] gridExtra_2.3 ipmisc_6.0.0 pairwiseComparisons_3.1.5
[118] farver_2.1.0 digest_0.6.27 shiny_1.6.0
[121] BWStest_0.2.2 car_3.0-10 broom_0.7.6
[124] BayesFactor_0.9.12-4.2 performance_0.7.1 later_1.2.0
[127] httr_1.4.2 rsconnect_0.8.17 effectsize_0.4.4-1
[130] colorspace_2.0-0 rvest_1.0.0 fs_1.5.0
[133] splines_4.0.5 statmod_1.4.35 rematch2_2.1.2
[136] shinythemes_1.2.0 systemfonts_1.0.1 xtable_1.8-4
[139] gmp_0.6-2 jsonlite_1.7.2 nloptr_1.2.2.2
[142] rstan_2.21.2 zeallot_0.1.0 modeltools_0.2-23
[145] R6_2.5.0 pillar_1.6.0 htmltools_0.5.1.1
[148] mime_0.10 glue_1.4.2 fastmap_1.1.0
[151] minqa_1.2.4 DT_0.18 codetools_0.2-18
[154] pkgbuild_1.2.0 mvtnorm_1.1-1 utf8_1.2.1
[157] lattice_0.20-44 curl_4.3.1 gtools_3.8.2
[160] logspline_2.1.16 zip_2.1.1 shinyjs_2.0.0
[163] openxlsx_4.2.3 survival_3.2-11 rmarkdown_2.7
[166] munsell_0.5.0 haven_2.4.1 reshape2_1.4.4
[169] gtable_0.3.0 bayestestR_0.9.0

Hi again, @fusaroli. Could you post a reproducible example?

I managed to install the same version of R on MacOS Catalina (in a VM). But this simple example runs fine (in a new session):

library(brms)
data = mtcars
model = mpg ~ carb + cyl
x = rnorm(n = 5*10^7)  # approx 400 MB of data

job::job({
    print(mean(x))
    fit = brm(model, data)
})

Maybe you had some R6 classes loaded? They tend to include a lot of self-references. job is robust to this for standard R6 classes, but maybe not in all cases!

I cleaned up my session and run the following example (from this: https://www.sciencedirect.com/science/article/pii/S0010027718302804). the model is not optimal, just as an example.

Data: https://www.dropbox.com/s/figj2qzggwuoy42/d1ImputedData.csv?dl=0

ChildTokens_f0 <- bf(
  ChildTokens2S ~ 0 + Diagnosis + 
    Diagnosis:MSEL_EL + 
    Diagnosis:ChildTokens1S + 
    Diagnosis:ParentMLU1S + 
    (1 + MSEL_EL + ChildTokens1S + ParentMLU1S | gr(ChildID, by=Diagnosis))
)

priorTokens_gaussian <- c(
  prior(normal(0, 0.2), class = b),
  prior(normal(0, 1), class = b, coef="DiagnosisASD"),
  prior(normal(0, 1), class = b, coef="DiagnosisTD"),
  prior(normal(0, 0.6), class = sd),
  prior(lkj(5), class = cor)
)

job::job(brm_result = {
ChildTokensPred0 <- brm(
  ChildTokens_f0, 
  data = Data,
  family = gaussian,
  prior = priorTokens_gaussian,
  sample_prior = T,
  file = here("models","Predictions", "ChildTokensPred0_m"),
  chains = 2,
  cores = 2,
  iter = 2000,
  backend = "cmdstanr",
  threads = threading(2),
  control = list(
    adapt_delta=0.99, 
    max_treedepth=20),
  save_pars = save_pars(all = TRUE)
  )
  ChildTokensPred0 <- add_criterion(ChildTokensPred0, criterion="loo")
})

I get the same error

Error: C stack usage 7969232 is too close to the limit

It runs fine in my MacOS VM (and Windows 10). Same sessionInfo() as the one you posted earlier. Could you try and reinstall job and test it again?

remove.packages("job")
remotes::install_github("lindeloev/job")

It's been v0.2 for a long time, even though the code has been updated. If this doesn't work, I'm completely blank!

it's odd, but I keep getting the same issue. I simplified the model to just the intercept, tried a smaller dataset, but still

Error: C stack usage 7969184 is too close to the limit

Other MacOS users have similarly experienced CStack errors, so I'd really appreciate if we could track down this bug! It's what keeps me from submitting to CRAN. No worries if you're too busy, though!

It's definitely not the data size that's the issue here but likely something about the import into the job. Things to try:

CStack branch

Did you try using the cstack branch in which I've disabled the infinite-regress-prone code? Try running some "failing" code after doing

remotes::install_github("lindeloev/job@cstack")

Without job

Does it work if you save the code inside job::job() to a script, then call rstudioapi::jobRunScript("your_script.R", name = "brm_result") instead of job::job()? This should also import from your environment and return brm_result. If this works, we've at least located that the error is job-specific.

README examples

In a new session, do the two examples from the README run? If not, does adding import = NULL, packages = NULL, opts = NULL) as arguments help?

No import

Does your brms code run OK if you put everything in job::job({<all code here>}, import = NULL)? I.e. a blank session and everything inside job::job(). If it runs OK, it's the import.

I will (it might take some time, with the ongoing exams and silly danish vacations, but I will)

CStack branch: nope. same error.
README: no, same error. Me silly for not having tried it first.
No import: same error.
Without job: Hurra! It works!

Wow, so the error did not arise where I thought! Do you have something special in your .RProfile or elsewhere? Either way, I just pushed a new version with a lot of print() commands, so we can see where it fails. Could you install the latest version:

remotes::install_github("lindeloev/job@cstack")

And then run some CStack-failing code in a new session, e.g., the README example:

job::job({
    foo = 10
    bar = rnorm(5)
})

and then paste what's printed to the console of your main session? This is the printing from a successful run:

[1] "Entered job::job()"
[1] "Get args"
[1] "Get return name"
[1] "Assign unnamed by position"
[1] "Code to string"
[1] "Set job title"
[1] "Get list of attached packages"
[1] "save_env: init"
[1] "save_env: Computing environment size"
[1] "save_env: Saving environment"
[1] "Init import settings"
[1] "save_opts: Saving"
[1] "Building R code for job"
[1] "Calling jobRunScript()"

The basic example from the readme works.
My reproducible example from above gives:
[1] "Entered job::job()"
[1] "Get args"
[1] "Get return name"
[1] "Assign unnamed by position"
[1] "Code to string"
[1] "Set job title"
[1] "Get list of attached packages"
[1] "save_env: init"
[1] "save_env: Computing environment size"
[1] "save_env: Saving environment"
[1] "Init import settings"
[1] "save_opts: Saving"
Error: C stack usage 7969408 is too close to the limit

🤯🤯🤯 That brings it down very few innocent (I thought) lines of code. Can you replicate the CStack error by running this code? It saves a small temporary file in your OS's temp dir (to be loaded from within the job).

# Set up fail condition
library(brms)
opts = options()

# The potentially misbehaving vode
.__js__ = list(
    opts = opts,
    wd = getwd()
  )
settings_file = gsub("\\\\", "/", tempfile())  # Windows only: Easier to paste() later
suppressWarnings(saveRDS(.__js__, settings_file))

If this doesn't CStack-fail, I think I'm out of ideas.

However, if it DOES CStack-fail, we can hunt it down! To me, it hints at some weird interaction between options set by brms (and dependencies) and saveRDS(). To verify, could you try these in blank sessions?

  1. running the same with library(brms) out-commented and verify that it runs?
  2. Keep library(brms) but set opts = list() and verify that it runs?

On my end, library(brms) sets 23 options. We simply need to identify the "bad" one(s). Could you post the output of this?

# Get options before and after loading brms
opts_empty = options()
library(brms)
opts_post = options()

# Print added options
print(opts_empty[names(opts_post) %in% names(opts_empty) == FALSE])

cstack-fail confirmed. working on the follow ups

commented out brms condition: it runs

Keep library(brms) but set opts = list() and verify that it runs?
That also runs

woohoo, following the live-action intensely

Output (added options)

$askpass
function (prompt) 
{
    .rs.askForPassword(prompt)
}
<environment: 0x7ffe52a7c450>

$check.bounds
[1] FALSE

$editor
[1] "vi"

$help.try.all.packages
[1] FALSE

$HTTPUserAgent
[1] "RStudio Desktop (1.4.1118); R (4.0.5 x86_64-apple-darwin17.0 x86_64 darwin17.0)"

$install.packages.compile.from.source
[1] "interactive"

$internet.info
[1] 2

$keep.parse.data
[1] TRUE

$keep.parse.data.pkgs
[1] FALSE

$pager
function (files, header, title, delete.file) 
{
    .rs.pager(files, header, title, delete.file)
}
<environment: 0x7ffe52a7c450>

$PCRE_study
[1] FALSE

$PCRE_use_JIT
[1] TRUE

$pdfviewer
[1] "/usr/bin/open"

$plumber.docs.callback
function (url) 
{
    invisible(.Call("rs_plumberviewer", url, getwd(), "window", 
        PACKAGE = "(embedding)"))
}
<environment: base>
attr(,"plumberViewerType")
[1] "window"

$plumber.swagger.url
function (url) 
{
    invisible(.Call("rs_plumberviewer", url, getwd(), "window", 
        PACKAGE = "(embedding)"))
}
<environment: base>
attr(,"plumberViewerType")
[1] "window"

$printcmd
[1] "lpr"

$profvis.prof_extension
[1] ".Rprof"

$prompt
[1] "> "

$useFancyQuotes
[1] TRUE

$verbose
[1] FALSE

$viewer
function (url, height = NULL) 
{
    if (!is.character(url) || (length(url) != 1)) 
        stop("url must be a single element character vector.", 
            call. = FALSE)
    if (identical(height, "maximize")) 
        height <- -1
    if (!is.null(height) && (!is.numeric(height) || (length(height) != 
        1))) 
        stop("height must be a single element numeric vector or 'maximize'.", 
            call. = FALSE)
    invisible(.Call("rs_viewer", url, height, PACKAGE = "(embedding)"))
}
<environment: 0x7ffe52a7c450>

$warn
[1] 0

$<NA>
NULL

So we're down to setting one (or several) options to NULL (e.g., options(viewer = NULL)) until it works. My prime suspects are those in different environments.

If I'm right, we could avoid the CStack error by running the same script as above but removing some options after loading brms?

library(brms)
options(askpass = NULL, pager = NULL, viewer = NULL)
opts = options()
...

perhaps add plumber.docs.callback = NULL, plumber.swagger.url = NULL.

with plumber etc. no error
w/o plumber no error

Fixed! As always,

 remotes::install_github("lindeloev/job@cstack")

Then stress-test it with all you've got. Fingers crossed.

Uhmm, I am sorry to report that the v first example I reported above gives the following:

Job launched.                                                                      
Error: C stack usage  7969200 is too close to the limit

Argh! OK, I hope this fairly general solution works:

remotes::install_github("lindeloev/job@cstack")

If not, this one contains exactly the solution that worked yesterday:

remotes::install_github("lindeloev/job@cstack2")

This is odd, but neither works. Just to make sure, here is the code I'm running:

pacman::p_load(
  tidyverse,
  job,
  brms
)
d2 <- read_csv("data/d1ImputedData.csv")

ChildTokens_f0 <- bf(
  ChildTokens2S ~ 0 + Diagnosis + 
    Diagnosis:MSEL_EL + 
    Diagnosis:ChildTokens1S + 
    Diagnosis:ParentMLU1S + 
    (1 + MSEL_EL + ChildTokens1S + ParentMLU1S | gr(ChildID, by=Diagnosis))
)

priorTokens_gaussian <- c(
  prior(normal(0, 0.2), class = b),
  prior(normal(0, 1), class = b, coef="DiagnosisASD"),
  prior(normal(0, 1), class = b, coef="DiagnosisTD"),
  prior(normal(0, 0.6), class = sd),
  prior(lkj(5), class = cor)
)

job::job(brm_result = {
  ChildTokensPred0 <- brm(
    ChildTokens_f0, 
    data = d2,
    family = gaussian,
    prior = priorTokens_gaussian,
    sample_prior = T,
    chains = 2,
    cores = 2,
    iter = 2000,
    control = list(
      adapt_delta=0.99, 
      max_treedepth=20),
    save_pars = save_pars(all = TRUE)
  )
  ChildTokensPred0 <- add_criterion(ChildTokensPred0, criterion="loo")
})

Thanks for your help and patience so far, @fusaroli! I'll need to focus on other things for a few days, but will return to this.

I wonder if it would be possible to set up a remote desktop session so I could try out a few solutions interactively to speed up iterations and interrupt you less? I'm on Windows 10.

Tagging related errors that were fixed: #4 #10 #17.

OK, pushed a new version and it runs on other users' Mac RStudios. I found out that some options were really calls and a few other things, so they are fixed now as well. If you want to try it out, @fusaroli:

remotes::install_github("lindeloev/job")
Restart RStudio (don't know why, but it seems to help!)
Run the brms example.

Crossing fingers and toes :-)

OK, wait, I just replicated it on my VM! Will update if I fix it.

Found it!!!!1!!one!! We encountered this bug: r-lib/cpp11#116. Even though it should've been fixed in an older version of cpp11, it re-appears in some weird combo of loading packages and running brms::prior().

Would appreciate if you can try and confirm it cf. my reply above, @fusaroli.

I tested the new version on 4 different models and so far no c stack error!
Awesome work!

A.w.e.s.o.m.e! Wonderful to see this closed. It seems like a bug a lot of users could encounter.