Need data argument for bootstrapping
billdenney opened this issue · 6 comments
Similar to #4, can a data
argument be added to boot_nlme
, boot_lme
, and boot_nls
?
Here is a reprex:
require(nlme)
#> Loading required package: nlme
fit_the_model <- function() {
data(barley, package = "nlraa")
barley2 <- subset(barley, year < 1974)
fit.lp.gnls2 <- gnls(yield ~ linp(NF, a, b, xs), start=c(a=170, b=100, xs=5), data = barley2)
barley2$year.f <- as.factor(barley2$year)
cfs <- coef(fit.lp.gnls2)
fit.lp.gnls3_inner <-
update(fit.lp.gnls2,
params = list(a + b + xs ~ year.f),
start = c(cfs[1], 0, 0, 0,
cfs[2], 0, 0, 0,
cfs[3], 0, 0, 0))
fit.lp.gnls3_inner
}
fit.lp.gnls3 <- fit_the_model()
#> Error in linp(NF, a, b, xs): could not find function "linp"
sims <- simulate_nlme(fit.lp.gnls3, nsim = 3)
#> Error in simulate_nlme(fit.lp.gnls3, nsim = 3): could not find function "simulate_nlme"
boot <- boot_nlme(fit.lp.gnls3, nsim = 3)
#> Error in boot_nlme(fit.lp.gnls3, nsim = 3): could not find function "boot_nlme"
barley3 <- subset(barley, year < 1974)
barley3$year.f <- as.factor(barley3$year)
boot <- boot_nlme(fit.lp.gnls3, nsim = 3, data=barley3)
#> Error in boot_nlme(fit.lp.gnls3, nsim = 3, data = barley3): could not find function "boot_nlme"
Created on 2021-08-01 by the reprex package (v2.0.0)
Thanks for reporting this! boot_* will indeed fail if the data is not in scope. I'll add the data argument as I did for simulate_*
@billdenney notice that the data argument for boot_* will not solve the issue for your particular example. Objects created within a function only live inside that function's environment. For example:
f <- function() x <- 3
x
## Error: object 'x' not found
object barley2 only exists in the environment of fit_the_model. There are many ways of addressing this. The first one is normally discouraged (because of potential side effects), but I have used it occasionally if I think it won't cause problems. It involves using the <<- operator.
Option 1:
library(nlraa)
library(nlme)
fit_the_model <- function() {
data(barley, package = "nlraa")
barley2 <- subset(barley, year < 1974)
fit.lp.gnls2 <- gnls(yield ~ linp(NF, a, b, xs), start=c(a=170, b=100, xs=5), data = barley2)
barley2$year.f <- as.factor(barley2$year)
barley2 <<- barley2 ## barley2 is now available in the parent environment
cfs <- coef(fit.lp.gnls2)
fit.lp.gnls3_inner <-
update(fit.lp.gnls2,
params = list(a + b + xs ~ year.f),
start = c(cfs[1], 0, 0, 0,
cfs[2], 0, 0, 0,
cfs[3], 0, 0, 0))
fit.lp.gnls3_inner
}
fit.lp.gnls3 <- fit_the_model()
sims <- simulate_nlme(fit.lp.gnls3, nsim = 3, data = barley2)
I realize that you might be writing a function which creates or modifies data and then fits a model. In that case, I think it is better to return a list with the fitted model and the new/modified data (and other required objects). This, to me, is a better approach. Now, it is not necessary to pass the data to simulate or boot because it is available in the global environment, so that nlme::getData can find it.
Option 2:
fit_the_model <- function() {
data(barley, package = "nlraa")
barley2 <- subset(barley, year < 1974)
fit.lp.gnls2 <- gnls(yield ~ linp(NF, a, b, xs), start=c(a=170, b=100, xs=5), data = barley2)
barley2$year.f <- as.factor(barley2$year)
cfs <- coef(fit.lp.gnls2)
fit.lp.gnls3_inner <-
update(fit.lp.gnls2,
params = list(a + b + xs ~ year.f),
start = c(cfs[1], 0, 0, 0,
cfs[2], 0, 0, 0,
cfs[3], 0, 0, 0))
## cfs also needs to be available to re-fit the model
ans <- list(model = fit.lp.gnls3_inner, data = barley2, cfs = cfs)
ans
}
res <- fit_the_model()
fit.lp.gnls3 <- res$model
barley2 <- res$data
cfs <- res$cfs
sims <- simulate_nlme(fit.lp.gnls3, nsim = 3)
fit.lp.gnls3.bt <- boot_nlme(fit.lp.gnls3, R = 100)
Sorry for the bad reprex. I was running through the process too quickly. And, thank you for the suggestion.
My goal is similar to option 2 in your example, but my code is a good bit more complex. (That's not the biggest deal in the world, but it does add complexity to everything that has to be kept track of which is what I was hoping to minimize.)
In option 2, the user has to remember to set barley2 <- res$data
, and the barley2
variable must be set in the global environment. My use case is bootstrapping is happening within a function and within a drake
workflow which restricts the user from assigning to the global environment via <<-
due to reproducibility issues. In this better-defined scenario, I hope that a data
argument to the boot_*
functions could fix the issue.
So, a better reprex for my use case is:
library(nlme)
library(nlraa)
fit_the_model <- function() {
data(barley, package = "nlraa")
barley2 <- subset(barley, year < 1974)
fit.lp.gnls2 <- gnls(yield ~ linp(NF, a, b, xs), start=c(a=170, b=100, xs=5), data = barley2)
barley2$year.f <- as.factor(barley2$year)
cfs <- coef(fit.lp.gnls2)
fit.lp.gnls3_inner <-
update(fit.lp.gnls2,
params = list(a + b + xs ~ year.f),
start = c(cfs[1], 0, 0, 0,
cfs[2], 0, 0, 0,
cfs[3], 0, 0, 0))
## cfs also needs to be available to re-fit the model
ans <- list(model = fit.lp.gnls3_inner, data = barley2, cfs = cfs)
ans
}
bootstrap_the_model <- function(model_with_data) {
fit.lp.gnls3 <- model_with_data$model
barley2 <- model_with_data$data
cfs <- model_with_data$cfs
#sims <- simulate_nlme(fit.lp.gnls3, nsim = 3)
fit.lp.gnls3.bt <- boot_nlme(fit.lp.gnls3, R = 100)
fit.lp.gnls3.bt
}
res <- fit_the_model()
my_boot <- bootstrap_the_model(res)
#> Error in eval(mCall$data): object 'barley2' not found
Created on 2021-08-04 by the reprex package (v2.0.0)
No need to apologize at all. I was trying to be as clear as possible in my explanation. The version of the package here in github now has the data argument for the boot_ functions. (If you try it let me know how it goes). However, one issue with your example is that you also need the cfs object, which cannot be found by boot_nlme (environments are nested too deep, I think). My simple fix for this is to replace the cfs object with just numbers in your example above. This is what I mean:
fit.lp.gnls3_inner <-
update(fit.lp.gnls2,
params = list(a + b + xs ~ year.f),
start = c(176, 0, 0, 0,
27, 0, 0, 0,
6, 0, 0, 0))
If you can use the SS version of the function which can help with convergence. (SSlinp does not need starting values - this is irrelevant in the example above but it might be in your specific case).
This is resolved in the current version of the code in github. Is it okay if I close this issue?
Sure. Thanks for implementing it!