case_bootstrap of lme4 model with transformed variables and offset
jradinger opened this issue · 2 comments
Case-wise bootstrapping of a lme4 model (using case_bootstrap()) does not work in case of transformed variables and when an offset is included. This is because the model uses the dataframe extracted from the model (model@frame; https://github.com/aloy/lmeresampler/blob/master/R/bootstrap_lme4.R#L37) which has columns already transformed (and the column names reformated). So the refitting the model based on the bootstrapped dataframe cannot find the original (untransformed) variables.
library(lme4)
library(lmeresampler)
data <- expand.grid(re1=c("A","B","C","D","E","F","G","H"),
fe1=c("I","II","III"))
data$fe2 <- runif(nrow(data))
data$y <- sample(c(1:100),nrow(data))
data$effort <- sample(c(1000:2000),nrow(data))
mod.glmer0 <- glmer(y ~ fe1 + fe2 + (1 | re1), family=poisson, data=data)
mod.glmer1 <- glmer(y ~ fe1 + log(fe2) + (1 | re1) , family=poisson, data=data)
mod.glmer2 <- glmer(y ~ fe1 + fe2 + (1 | re1) + offset(log(effort)) , family=poisson, data=data)
mod.glmer3 <- glmer(y ~ fe1 + log(fe2) + (1 | re1) + offset(log(effort)), family=poisson, data=data)
case_bootstrap(mod.glmer0,.f=fixef,
B=10,resample=c(TRUE,TRUE))
Extracting the original untransformed dataframe from an lme4 model is not possible. A workaround is to provide in addition to the model also the original dataframe (this is also done for example in some of the functions of the 'merTools' package (https://github.com/jknowles/merTools/blob/master/R/merData.R#L167-L170)):
case_bootstrap.merMod <- function(model, .f, B, resample, origData=NULL){
if(!is.null(origData)){
data <- origData
}else{
data <- model@frame
}
flist <- lme4::getME(model, "flist")
re_names <- names(flist)
clusters <- c(rev(re_names), ".id")
...}
With this approach, it is needed to check the order of the columns in the original data as this might not correspond to the order of the variables in the model. This becomes an issue here: https://github.com/aloy/lmeresampler/blob/master/R/resamplers.R#L61 where the first column of the data is assumed to be the response 'y'. Solving this could look like:
colnames(resamp_data)[colnames(resamp_data)==getResponseFromFormula(model)] <- "y" # In case if origData is provided and the column order is not in order with formula --> then indexing 'y' with 1 fails
Just forgot to define the getResponseFromFormula
:
getResponseFromFormula = function(model) {
if (attr(terms(model) , which = 'response'))
all.vars(getCall(model)$formula)[1]
else
NULL
}
@jradinger, thanks for submitting this bug and fix! I just pushed these changes to GitHub, so you can use remotes
to install the newest version. I will be pushing to CRAN within the next few weeks, but there are a few other things I am working out first.