tlverse/origami

Error when passing optional arguments to `cross_validate`

osofr opened this issue · 10 comments

osofr commented

Minimal example below, taken from vignette. The function cross_validate currently errors out whenever I try to pass any optional arguments to cv_fun . The error appears to be downstream with future package. This is crucial to my functionality, I have to pass optional arguments to cv_fun. Any advice on work-arounds?

library(origami)
cvlm <- function(fold, mydata) {
    train_data <- training(mydata)
    valid_data <- validation(mydata)

    mod <- lm(mpg ~ ., data = train_data)
    preds <- predict(mod, newdata = valid_data)
    list(coef = data.frame(t(coef(mod))), SE = ((preds - valid_data$mpg)^2))
}

# cross-validated estimate
folds <- make_folds(mtcars)
results <- cross_validate(cvlm, folds, mtcars)
results <- cross_validate(cvlm, folds, mydata = mtcars)

In general, I think it might not be the best practice to provide an example with a function cvlm that depends on some object (mtcars) that was defined in the calling environment of this function. This creates quite a bit of confusion and makes it hard to read the code. That's just my opinion though.

ck37 commented

Yep I ran into that exact same issue myself, and reported it here (see workarounds): HenrikBengtsson/future#156

Thanks for reporting this error, Oleg. And, thanks for having already reported this error in future to Henrik, Chris.

I'm honestly not sure of a quick fix off of the top of my head, but can do a bit of digging and report anything back here. Based on the conversation in the issue reported to future, it looks like Henrik has plans to fix this soon.

Also, thanks for pointing out that slip up in the definition of the cvlm function. I definitely agree that having the function defined in that way is bad practice and will update that example (and any others like it) to reflect this.

future.globals=FALSE should work fine for plan(sequential) and plan(multicore), but for plan(multisession) you'll need to specify the globals and packages needed by the different sessions manually. You can see an example of that here: https://github.com/jeremyrcoyle/SuperLearner/blob/integrate_origami/tests/testthat/test-origami_SuperLearner.R#L39

This seems like something of a regression from when we were using foreach. Depending on how long the underlying issue takes to get fixed, we might want to consider going back to foreach or having an option to use either foreach or future. Thoughts?

Ok -- good to know that you encountered this problem in the origami-SL integration as well.

Given that its unclear when the error might be resolved on the foreach side, the most sensible thing seems to be to restore the foreach implementation, having either an option for users to specify use of foreach v. future (as you suggested), or simply having foreach be a failsafe for any problems that might arise from use of future (the latter assumes that the foreach implementation was fairly robust and suggests something in the style of try-catch).

ck37 commented

This wasn't encountered in the original SL implementation, it was added today: https://github.com/jeremyrcoyle/SuperLearner/commit/b0b3201befb7c262a8a4220ce45e403903056c14

With examples (and/or tests) of the parallelization options in origami it would have been identified though.

@ck37 I've been aware of the issue for a while now (see https://app.asana.com/0/270379281729322/369465305300769/f), but haven't had time to look into it in more detail (I'm currently on vacation). There's a commented out test here: https://github.com/jeremyrcoyle/origami/blob/master/tests/testthat/test_future_plan.R#L42 that is disabled because it was failing in some environments, and it didn't seem to make sense to delay our package's release due to an upstream issue we don't yet fully understand/have to wait for Henrik to fix.

ck37 commented

Henrik has fixed the bug: HenrikBengtsson/future#156

Looks like we can move to close this issue once we confirm that the change in globals resolved our problem (after installing globals via remotes::install_github("HenrikBengtsson/globals@develop") as per Henrik's recommendation).

It sounds like he's planning a new CRAN release, so we should be ok without having to mess with how dependencies are currently specified for origami.

Closing since globals 0.10.1 has been added to CRAN and made a dependency for future.

The issue with the example cv_funs relying on objects defined in the calling environment, as noted in the original post, has been resolved in all examples as of 64569a8.