easystats/parameters

bootstrap_model object unusable by emmeans

plants-22 opened this issue · 7 comments

Hi, I have a problem with a bootstrap_model feeding into emmeans.

The model that was bootstrapped was a GLMM (tweedie) glmmTMB object (formula = cover_pc ~ Plot + (1|Block)). I can get bootstrap_parameters fine but haven't had any luck feeding emmeans with a bootstrapped model object. Here's the process I used to make the bootstrap_model object:

mod <- all_model_objects[[3]][["obj_model_plot"]]

nCores <- 6 

cl <- makeCluster(nCores)
clusterEvalQ(cl, library("glmmTMB"))
cover_filtered <- all_model_objects[[3]][["data"]] # Tell bootstrap where to find its data set
clusterExport(cl, varlist = "cover_filtered")

test <- bootstrap_model(model = mod, parallel = "snow", n_cpus = nCores, cl = cl) 

stopCluster(cl)

I get a bootstrap_model object with a giant attribute list which all looks normal.

>   attributes(test)
$names
[1] "(Intercept)"          "PlotExclude all"      "PlotExclude kangaroo" "PlotExclude rabbit"  
[5] NA                    

$row.names
   [1]    1    2    3    4    5    6    7    8    9   10   11   12   13   14   15   16   17   18   19   20   21
 ...
 [988]  988  989  990  991  992  993  994  995  996  997  998  999 1000

$class
[1] "bootstrap_model"     "see_bootstrap_model" "data.frame"         

$original_model
Formula:          cover_pc ~ Plot + (1 | Block)
Data: cover_filtered
      AIC       BIC    logLik  df.resid 
123.73279 129.14091 -54.86639         9 
Random-effects (co)variances:

Conditional model:
 Groups Name        Std.Dev.
 Block  (Intercept) 0.5387  

Number of obs: 16 / Conditional model: Block, 4

Dispersion parameter for tweedie family (): 1.45 

Tweedie power estimate: 1.23 

Fixed Effects:

Conditional model:
         (Intercept)       PlotExclude all  PlotExclude kangaroo    PlotExclude rabbit  
              3.6926               -1.7852               -1.3309               -0.8183  

When I pass it to emmeans I get...

pairs <- emmeans(object = test, specs = pairwise ~ Plot)

Error: Oops! Cannot create the reference grid. Please open an issue at https://github.com/easystats/parameters/issues.

Any help would be appreciated.

Cheers Neil

Hi Neil,

Can you post some data that can be used to reproduce this error?

Mixed model and data for emmeans testing.zip *
glmmTMB bootstrap_model object.zip

Yes, here's a zip with the data in a list along with the glmmTMB model made from the data.

Thanks

*I just updated the data so it has only the bits you need

Hi, I should add, I have the same issue if I try emmeans on a bootstrap_parameters data frame. I'm more interested in using emmeans on the bootstrap_parameters df than the bootstrap_model objects but both should be addressed. I'll post an example soon

So the idea of bootstrap_model() and bootstrap_parameters() was to try and give easy access to basic bootstrapping (same for the emmeans support). For basic (g)lm model, these features work fine.

But for more complex models (mixed models, mixture model, models with multiple componants) things get messy and I'm not sure it would be feasible to try and support all such cases.

Having said that, here are two solutions to the issue with this type of glmmTMB models:

1. Use bootstrap_model()

library(parameters)
library(glmmTMB)
library(emmeans)

load("glmmTMB bootstrap_model object.rdata")
test <- x[[1]]

load("Mixed model and data for emmeans testing.rdata")
cover_filtered <- x$data
obj_model_plot <- x$obj_model_plot

rg1 <- ref_grid(obj_model_plot)
colnames(rg1@V)

colnames(test) # names must match
#> [1] "(Intercept)"          "PlotExclude all"      "PlotExclude kangaroo" "PlotExclude rabbit"  
#> [5] NA

# Let's get rig of the 5th column
# (Which I think the the sd{intercept})
test[5] <- NULL

(em1 <- emmeans(obj_model_plot, ~ Plot))
#>  Plot             emmean    SE df lower.CL upper.CL
#>  Control            3.69 0.312  9     2.99     4.40
#>  Exclude all        1.91 0.401  9     1.00     2.81
#>  Exclude kangaroo   2.36 0.371  9     1.52     3.20
#>  Exclude rabbit     2.87 0.341  9     2.10     3.64
#> 
#> Results are given on the log (not the response) scale. 
#> Confidence level used: 0.95

(em2 <- emmeans(test, ~ Plot))
#>  Plot             emmean lower.HPD upper.HPD
#>  Control            3.61     1.440      4.64
#>  Exclude all        1.85    -0.517      3.08
#>  Exclude kangaroo   2.34     0.214      3.44
#>  Exclude rabbit     2.85     0.824      4.02
#> 
#> Point estimate displayed: median 
#> Results are given on the log (not the response) scale. 
#> HPD interval probability: 0.95


pairs(em1)
#>  contrast                          estimate    SE df t.ratio p.value
#>  Control - Exclude all                1.785 0.329  9   5.433  0.0019
#>  Control - Exclude kangaroo           1.331 0.295  9   4.508  0.0066
#>  Control - Exclude rabbit             0.818 0.249  9   3.283  0.0392
#>  Exclude all - Exclude kangaroo      -0.454 0.381  9  -1.192  0.6465
#>  Exclude all - Exclude rabbit        -0.967 0.355  9  -2.727  0.0905
#>  Exclude kangaroo - Exclude rabbit   -0.513 0.322  9  -1.593  0.4283
#> 
#> Results are given on the log (not the response) scale. 
#> P value adjustment: tukey method for comparing a family of 4 estimates

pairs(em2)
#>  contrast                          estimate lower.HPD upper.HPD
#>  Control - Exclude all                1.755     1.120     2.617
#>  Control - Exclude kangaroo           1.300     0.665     1.991
#>  Control - Exclude rabbit             0.783     0.130     1.378
#>  Exclude all - Exclude kangaroo      -0.451    -1.365     0.311
#>  Exclude all - Exclude rabbit        -0.985    -1.767    -0.249
#>  Exclude kangaroo - Exclude rabbit   -0.508    -1.224     0.247
#> 
#> Point estimate displayed: median 
#> Results are given on the log (not the response) scale. 
#> HPD interval probability: 0.95 

2. Use lme4::bootMer()

glmmTMB supports lme4::bootMer(), which you can use to bootstrap contrast estimates (see this blog post for more info).

get_pairwise <- function(mod) {
  .em <- emmeans(mod, ~ Plot)
  .pairs <- summary(pairs(.em))
  
  setNames(.pairs$estimate, .pairs$contrast)
}

boot_pairs <- lme4::bootMer(obj_model_plot, FUN = get_pairwise, nsim = 10)
bayestestR::describe_posterior(as.data.frame(boot_pairs$t), test = NULL)
#> Summary of Posterior Distribution
#> 
#> Parameter                         | Median |         95% CI
#> -----------------------------------------------------------
#> Control - Exclude all             |   1.72 | [ 1.41,  2.11]
#> Control - Exclude kangaroo        |   1.18 | [ 0.67,  1.57]
#> Control - Exclude rabbit          |   0.77 | [ 0.42,  1.30]
#> Exclude all - Exclude kangaroo    |  -0.53 | [-1.21, -0.13]
#> Exclude all - Exclude rabbit      |  -0.94 | [-1.56, -0.41]
#> Exclude kangaroo - Exclude rabbit |  -0.40 | [-0.69,  0.10]

Thank you so much @mattansb! I really appreciate your time and help and I have it working now.

It turns out I stitched myself up. I had been running a modified version of bootstrap_parameters() that I added a 'cl' argument to, to getting working under 'snow' parallel processing. At some point after the 'cl' problem was solved on Github, the 'cl' package update came through however I was still using the modified script because I think there was delay in the update. It looks like the script (a last year's version of bootstrap_parameters, not sure which) was generating the mystery 5th row in the bootstrap_parameters output. If I run the current release of bootstrap_parameters, "snow" works and there is no longer an extra line in the data frame output, so two problems fixed.

I'm not sure how the 5th column in the bootstrap_model output example I posted came about. If I run bootstrap_model now I only get the columns I need. So, I can now feed a bootstrap_parameters object or a bootstrap_model object (generated from a glmm) to emmeans and it all works. Cheers!

Great! So I'm closing this issue. Feel free to reopen if you encounter any further related issues.