Error with model-averaged binomial GLMM (with glmmTMB): subscript out of bounds
MarcRieraDominguez opened this issue · 2 comments
Hello Russell! Thank you for maintaining the package!
Describe the bug
emmeans::emmeans()
won't work with a model-averaged binomial GLMM (class averaging
, MuMIn
package), when fitted with glmmTMB::glmmTMB()
:
Error in ME[, nms, drop = FALSE] : subscript out of bounds
I suspect it has to do with trouble accessing the data, perhaps due to shifting variable names. Unfortunately, don't know how to fix it.
The problem appears to be specific to model-averaged glmmTMB
: a model-averaged binomial GLM fitted with glmmTMB
throws the same error; but model-averaged lme4::glmer()
(binomial GLMM) and stats::glm()
(binomial GLM) work fine. I'd rather stay with glmmTMB
because it allows parallelization (my data has > 61000 rows, which I have not shared. Instead, I've prepared a reprex that throws the same error).
Expected behavior
The common output of emmeans::emmeans()
: estimated marginal means, with their standard errors, and Tukey contrats.
Additional context
I ran into this issue with a model-averaged binomial GLMM that related a proportion to: categorical factors (unordered), numerical variables with second-order polynomials (poly(x, 2, raw = TRUE)
), and numerical variables without polynomials. My dataset is large (around 62000 rows, and not yet published), so I've devised a reprex that throws the same error message.
To reproduce
library(emmeans)
#> Warning: package 'emmeans' was built under R version 4.2.3
library(glmmTMB)
#> Warning: package 'glmmTMB' was built under R version 4.2.3
#> Warning in checkMatrixPackageVersion(): Package version inconsistency detected.
#> TMB was built with Matrix version 1.6.1
#> Current Matrix version is 1.4.1
#> Please re-install 'TMB' from source using install.packages('TMB', type = 'source') or ask CRAN for a binary version of 'TMB' matching CRAN's 'Matrix' package
library(lme4)
#> Loading required package: Matrix
library(MuMIn)
#> Warning: package 'MuMIn' was built under R version 4.2.3
library(tidyverse)
#> Warning: package 'ggplot2' was built under R version 4.2.2
#> Warning: package 'tibble' was built under R version 4.2.3
#> Warning: package 'tidyr' was built under R version 4.2.3
#> Warning: package 'purrr' was built under R version 4.2.3
#> Warning: package 'dplyr' was built under R version 4.2.3
#> Warning: package 'stringr' was built under R version 4.2.3
# Setting for MuMin
options(na.action = "na.fail")
# Create response variable for binomial GLMM
# Inspired from here (page 4): https://glmmtmb.github.io/glmmTMB/articles/glmmTMB.html
Owls <- mutate(glmmTMB::Owls,
NCalls = if_else(SiblingNegotiation == 0, 0, 1))
summary(Owls)
#> Nest FoodTreatment SexParent ArrivalTime
#> Oleyes : 52 Deprived:320 Female:245 Min. :21.71
#> Montet : 41 Satiated:279 Male :354 1st Qu.:23.11
#> Etrabloz : 34 Median :24.38
#> Yvonnand : 34 Mean :24.76
#> Champmartin: 30 3rd Qu.:26.25
#> Lucens : 29 Max. :29.25
#> (Other) :379
#> SiblingNegotiation BroodSize NegPerChick logBroodSize
#> Min. : 0.00 Min. :1.000 Min. :0.000 Min. :0.000
#> 1st Qu.: 0.00 1st Qu.:4.000 1st Qu.:0.000 1st Qu.:1.386
#> Median : 5.00 Median :4.000 Median :1.200 Median :1.386
#> Mean : 6.72 Mean :4.392 Mean :1.564 Mean :1.439
#> 3rd Qu.:11.00 3rd Qu.:5.000 3rd Qu.:2.500 3rd Qu.:1.609
#> Max. :32.00 Max. :7.000 Max. :8.500 Max. :1.946
#>
#> NCalls
#> Min. :0.0000
#> 1st Qu.:0.0000
#> Median :1.0000
#> Mean :0.7396
#> 3rd Qu.:1.0000
#> Max. :1.0000
#>
str(Owls)
#> 'data.frame': 599 obs. of 9 variables:
#> $ Nest : Factor w/ 27 levels "AutavauxTV","Bochet",..: 1 1 1 1 1 1 1 1 1 1 ...
#> $ FoodTreatment : Factor w/ 2 levels "Deprived","Satiated": 1 2 1 1 1 1 1 2 1 2 ...
#> $ SexParent : Factor w/ 2 levels "Female","Male": 2 2 2 2 2 2 2 1 2 1 ...
#> $ ArrivalTime : num 22.2 22.4 22.5 22.6 22.6 ...
#> $ SiblingNegotiation: int 4 0 2 2 2 2 18 4 18 0 ...
#> $ BroodSize : int 5 5 5 5 5 5 5 5 5 5 ...
#> $ NegPerChick : num 0.8 0 0.4 0.4 0.4 0.4 3.6 0.8 3.6 0 ...
#> $ logBroodSize : num 1.61 1.61 1.61 1.61 1.61 ...
#> $ NCalls : num 1 0 1 1 1 1 1 1 1 0 ...
##############
### Binomial GLMM with glmmTMB
mixmod.tmb <- glmmTMB(NCalls ~ SexParent + poly(BroodSize, 2, raw = TRUE) + FoodTreatment + (1|Nest), data=Owls, family=binomial(link = "logit"))
# Model-averaging
mixmod.tmb.d <- MuMIn::dredge(mixmod.tmb, rank = "AICc")
#> Fixed terms are "cond((Int))" and "disp((Int))"
mixmod.tmb.avg <- MuMIn::model.avg(subset(mixmod.tmb.d, delta <= 6), revised.var = TRUE, fit = TRUE)
# The glmmTMB model works fine, no need to indicated cond(variable_name)
MuMIn::getAllTerms(mixmod.tmb)
#> [1] "cond(FoodTreatment)"
#> [2] "cond(poly(BroodSize, 2, raw = TRUE))"
#> [3] "cond(SexParent)"
#> attr(,"random.terms")
#> [1] "cond(1 | Nest)"
#> attr(,"random")
#> . ~ . + cond(1 | Nest)
#> attr(,"response")
#> NCalls
#> attr(,"order")
#> [1] 3 2 1
#> attr(,"intercept")
#> cond zi disp
#> 1 0 1
#> attr(,"interceptLabel")
#> [1] "cond((Int))" "disp((Int))"
#> attr(,"deps")
#> cond(FoodTreatment)
#> cond(FoodTreatment) NA
#> cond(poly(BroodSize, 2, raw = TRUE)) FALSE
#> cond(SexParent) FALSE
#> cond(poly(BroodSize, 2, raw = TRUE))
#> cond(FoodTreatment) FALSE
#> cond(poly(BroodSize, 2, raw = TRUE)) NA
#> cond(SexParent) FALSE
#> cond(SexParent)
#> cond(FoodTreatment) FALSE
#> cond(poly(BroodSize, 2, raw = TRUE)) FALSE
#> cond(SexParent) NA
summary(model.frame(mixmod.tmb))
#> NCalls SexParent
#> Min. :0.0000 Female:245
#> 1st Qu.:0.0000 Male :354
#> Median :1.0000
#> Mean :0.7396
#> 3rd Qu.:1.0000
#> Max. :1.0000
#>
#> poly(BroodSize, 2, raw = TRUE).1 poly(BroodSize, 2, raw = TRUE).2
#> Min. :1.000000 Min. : 1.00000
#> 1st Qu.:4.000000 1st Qu.:16.00000
#> Median :4.000000 Median :16.00000
#> Mean :4.392321 Mean :20.61603
#> 3rd Qu.:5.000000 3rd Qu.:25.00000
#> Max. :7.000000 Max. :49.00000
#>
#> FoodTreatment Nest
#> Deprived:320 Oleyes : 52
#> Satiated:279 Montet : 41
#> Etrabloz : 34
#> Yvonnand : 34
#> Champmartin: 30
#> Lucens : 29
#> (Other) :379
emmeans::emmeans(mixmod.tmb, specs = pairwise ~ SexParent, adjust = "Tukey", regrid = "response")
#> $emmeans
#> SexParent prob SE df asymp.LCL asymp.UCL
#> Female 0.684 0.0485 Inf 0.589 0.779
#> Male 0.758 0.0394 Inf 0.681 0.835
#>
#> Results are averaged over the levels of: FoodTreatment
#> Confidence level used: 0.95
#>
#> $contrasts
#> contrast estimate SE df z.ratio p.value
#> Female - Male -0.0741 0.0413 Inf -1.794 0.0728
#>
#> Results are averaged over the levels of: FoodTreatment
emmeans::emmeans(mixmod.tmb, specs = pairwise ~ cond(SexParent), adjust = "Tukey", regrid = "response")
#> $emmeans
#> SexParent prob SE df asymp.LCL asymp.UCL
#> Female 0.684 0.0485 Inf 0.589 0.779
#> Male 0.758 0.0394 Inf 0.681 0.835
#>
#> Results are averaged over the levels of: FoodTreatment
#> Confidence level used: 0.95
#>
#> $contrasts
#> contrast estimate SE df z.ratio p.value
#> Female - Male -0.0741 0.0413 Inf -1.794 0.0728
#>
#> Results are averaged over the levels of: FoodTreatment
# For the "averaging" object, things are more tricky. For instance, model.frame() won't work
MuMIn::getAllTerms(mixmod.tmb.avg)
#> [1] "FoodTreatment" "poly(BroodSize, 2, raw = TRUE)"
#> [3] "SexParent"
#> attr(,"intercept")
#> [1] 0
#> attr(,"interceptLabel")
#> [1] "(Intercept)"
#> attr(,"response")
#> NCalls
#> attr(,"order")
#> [1] 1 2 3
#> attr(,"deps")
#> FoodTreatment poly(BroodSize, 2, raw = TRUE)
#> FoodTreatment NA FALSE
#> poly(BroodSize, 2, raw = TRUE) FALSE NA
#> SexParent FALSE FALSE
#> SexParent
#> FoodTreatment FALSE
#> poly(BroodSize, 2, raw = TRUE) FALSE
#> SexParent NA
model.frame(mixmod.tmb.avg)
#> Error in eval(predvars, data, env): objeto 'NCalls' no encontrado
# Data needs to be provided explicitly
emmeans::emmeans(mixmod.tmb.avg, specs = pairwise ~ SexParent, adjust = "Tukey", regrid = "response")
#> Error in model.frame.default(formula, data = data, ...) :
#> 'data' must be a data.frame, environment, or list
#> Error in (function (object, at, cov.reduce = mean, cov.keep = get_emm_option("cov.keep"), : Perhaps a 'data' or 'params' argument is needed
# The first impulse is to feed it the raw data, but it doesn't work
emmeans::emmeans(mixmod.tmb.avg, specs = pairwise ~ SexParent, adjust = "Tukey", regrid = "response", data = Owls)
#> Error in ME[, nms, drop = FALSE]: subíndice fuera de los límites
# I've tried multiple ways to feed the data to emmeans(), perhaps some are nonsense
emmeans::emmeans(mixmod.tmb.avg, specs = pairwise ~ SexParent, adjust = "Tukey", regrid = "response", data = model.matrix(mixmod.tmb.avg))
#> Error in eval(predvars, data, env): objeto 'FoodTreatment' no encontrado
emmeans::emmeans(mixmod.tmb.avg, specs = pairwise ~ SexParent, adjust = "Tukey", regrid = "response", data = model.frame(mixmod.tmb))
#> Error in poly(BroodSize, 2, raw = TRUE): objeto 'BroodSize' no encontrado
emmeans::emmeans(mixmod.tmb.avg, specs = pairwise ~ SexParent, adjust = "Tukey", regrid = "response", data = mixmod.tmb.avg$x)
#> Error in eval(predvars, data, env): objeto 'FoodTreatment' no encontrado
emmeans::emmeans(mixmod.tmb.avg, specs = pairwise ~ SexParent, adjust = "Tukey", regrid = "response", data = mixmod.tmb$frame)
#> Error in poly(BroodSize, 2, raw = TRUE): objeto 'BroodSize' no encontrado
##############
### Binomial GLM fitted with glmmTMB
mod.tmb <- glmmTMB(NCalls ~ SexParent + poly(BroodSize, 2, raw = TRUE) + FoodTreatment, data=Owls, family=binomial(link = "logit"))
# Model-averaging
mod.tmb.d <- MuMIn::dredge(mod.tmb, rank = "AICc")
#> Fixed terms are "cond((Int))" and "disp((Int))"
mod.tmb.avg <- MuMIn::model.avg(subset(mod.tmb.d, delta <= 6), revised.var = TRUE, fit = TRUE)
# emmeans
emmeans::emmeans(mod.tmb.avg, specs = pairwise ~ SexParent, adjust = "Tukey", regrid = "response", data = Owls)
#> Error in ME[, nms, drop = FALSE]: subíndice fuera de los límites
##############
### A mixed-effects binomial GLMM declared with lme4::glmer works as expected
# We get warnings about convergence, but emmeans performs as expected
glmer <- glmer(NCalls ~ SexParent + poly(BroodSize, 2, raw = TRUE) + FoodTreatment + (1|Nest), data=Owls, family=binomial(link = "logit"))
glmer.d <- MuMIn::dredge(glmer, rank = "AICc")
#> Warning in formals(fun): argument is not a function
#> Fixed term is "(Intercept)"
#> Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
#> Model failed to converge with max|grad| = 0.0035348 (tol = 0.002, component 1)
glmer.avg <- MuMIn::model.avg(subset(glmer.d, delta <= 6), revised.var = TRUE, fit = TRUE)
#> Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
#> Model failed to converge with max|grad| = 0.0035348 (tol = 0.002, component 1)
emmeans::emmeans(glmer, specs = pairwise ~ SexParent, adjust = "Tukey", regrid = "response", data = Owls)
#> $emmeans
#> SexParent prob SE df asymp.LCL asymp.UCL
#> Female 0.684 0.0483 Inf 0.589 0.779
#> Male 0.758 0.0393 Inf 0.681 0.835
#>
#> Results are averaged over the levels of: FoodTreatment
#> Confidence level used: 0.95
#>
#> $contrasts
#> contrast estimate SE df z.ratio p.value
#> Female - Male -0.074 0.0411 Inf -1.802 0.0715
#>
#> Results are averaged over the levels of: FoodTreatment
emmeans::emmeans(glmer.avg, specs = pairwise ~ SexParent, adjust = "Tukey", regrid = "response", data = Owls)
#> $emmeans
#> SexParent prob SE df lower.CL upper.CL
#> Female 0.700 0.0505 593 0.601 0.800
#> Male 0.749 0.0417 593 0.667 0.831
#>
#> Results are averaged over the levels of: FoodTreatment
#> Confidence level used: 0.95
#>
#> $contrasts
#> contrast estimate SE df t.ratio p.value
#> Female - Male -0.0489 0.0481 593 -1.016 0.3101
#>
#> Results are averaged over the levels of: FoodTreatment
##############
### A binomial GLM declared with stats::glm works as expected
glm <- glm(NCalls ~ SexParent + poly(BroodSize, 2, raw = TRUE) + FoodTreatment, data=Owls, family=binomial(link = "logit"))
glm.d <- MuMIn::dredge(glm, rank = "AICc")
#> Warning in formals(fun): argument is not a function
#> Fixed term is "(Intercept)"
glm.avg <- MuMIn::model.avg(subset(glm.d, delta <= 6), revised.var = TRUE, fit = TRUE)
emmeans::emmeans(glm, specs = pairwise ~ SexParent, adjust = "Tukey", regrid = "response", data = Owls)
#> $emmeans
#> SexParent prob SE df asymp.LCL asymp.UCL
#> Female 0.668 0.0321 Inf 0.605 0.731
#> Male 0.760 0.0246 Inf 0.711 0.808
#>
#> Results are averaged over the levels of: FoodTreatment
#> Confidence level used: 0.95
#>
#> $contrasts
#> contrast estimate SE df z.ratio p.value
#> Female - Male -0.0913 0.0372 Inf -2.454 0.0141
#>
#> Results are averaged over the levels of: FoodTreatment
emmeans::emmeans(glm.avg, specs = pairwise ~ SexParent, adjust = "Tukey", regrid = "response", data = Owls)
#> $emmeans
#> SexParent prob SE df lower.CL upper.CL
#> Female 0.675 0.0359 594 0.604 0.745
#> Male 0.756 0.0268 594 0.703 0.808
#>
#> Results are averaged over the levels of: FoodTreatment
#> Confidence level used: 0.95
#>
#> $contrasts
#> contrast estimate SE df t.ratio p.value
#> Female - Male -0.0808 0.0456 594 -1.772 0.0769
#>
#> Results are averaged over the levels of: FoodTreatment
sessionInfo()
#> R version 4.2.0 (2022-04-22 ucrt)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 22000)
#>
#> Matrix products: default
#>
#> locale:
#> [1] LC_COLLATE=Spanish_Spain.utf8 LC_CTYPE=Spanish_Spain.utf8
#> [3] LC_MONETARY=Spanish_Spain.utf8 LC_NUMERIC=C
#> [5] LC_TIME=Spanish_Spain.utf8
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] forcats_0.5.1 stringr_1.5.0 dplyr_1.1.2 purrr_1.0.1
#> [5] readr_2.1.2 tidyr_1.3.0 tibble_3.2.1 ggplot2_3.4.0
#> [9] tidyverse_1.3.1 MuMIn_1.47.5 lme4_1.1-29 Matrix_1.4-1
#> [13] glmmTMB_1.1.7 emmeans_1.8.9
#>
#> loaded via a namespace (and not attached):
#> [1] httr_1.4.3 jsonlite_1.8.0 splines_4.2.0
#> [4] modelr_0.1.8 assertthat_0.2.1 highr_0.9
#> [7] stats4_4.2.0 cellranger_1.1.0 yaml_2.3.5
#> [10] numDeriv_2016.8-1.1 pillar_1.9.0 backports_1.4.1
#> [13] lattice_0.20-45 glue_1.6.2 digest_0.6.33
#> [16] rvest_1.0.2 minqa_1.2.4 colorspace_2.0-3
#> [19] sandwich_3.0-1 htmltools_0.5.6 pkgconfig_2.0.3
#> [22] broom_0.8.0 haven_2.5.0 xtable_1.8-4
#> [25] mvtnorm_1.1-3 scales_1.2.1 tzdb_0.3.0
#> [28] generics_0.1.2 ellipsis_0.3.2 TH.data_1.1-1
#> [31] withr_2.5.0 TMB_1.9.6 cli_3.6.1
#> [34] crayon_1.5.1 readxl_1.4.0 survival_3.3-1
#> [37] magrittr_2.0.3 estimability_1.4.1 evaluate_0.15
#> [40] fs_1.5.2 fansi_1.0.3 nlme_3.1-157
#> [43] MASS_7.3-57 xml2_1.3.3 tools_4.2.0
#> [46] hms_1.1.1 lifecycle_1.0.3 multcomp_1.4-19
#> [49] munsell_0.5.0 reprex_2.0.2 compiler_4.2.0
#> [52] rlang_1.1.1 grid_4.2.0 nloptr_2.0.1
#> [55] rstudioapi_0.13 rmarkdown_2.14 boot_1.3-28
#> [58] gtable_0.3.0 codetools_0.2-18 DBI_1.1.2
#> [61] R6_2.5.1 zoo_1.8-10 lubridate_1.8.0
#> [64] knitr_1.39 fastmap_1.1.0 utf8_1.2.2
#> [67] stringi_1.7.6 Rcpp_1.0.11 vctrs_0.6.3
#> [70] dbplyr_2.1.1 tidyselect_1.2.0 xfun_0.40
#> [73] coda_0.19-4
Created on 2023-11-14 with reprex v2.0.2
Many thanks in advance!
This is due to the fact that some models have multiple portions and the coefficient names are typically prefixed or wrapped. In this case, we have:
> coef(mixmod.tmb.avg)
cond((Int)) cond(FoodTreatmentSatiated)
1.7410094 -1.8463991
cond(poly(BroodSize, 2, raw = TRUE)1) cond(poly(BroodSize, 2, raw = TRUE)2)
-0.5129341 0.1280694
cond(SexParentMale)
0.4261344
i.e., the coefficient names are wrapped with cond()
. This is provided for via the optional subset
argument -- see the [documentation]
(https://cran.r-project.org/web/packages/emmeans/vignettes/models.html#I) for the models vignette for averaging
objects. Using this we have
> emmeans(mixmod.tmb.avg, specs = pairwise ~ SexParent, adjust = "Tukey", regrid = "response", data = Owls,
+ subset = "wrap:cond")
$emmeans
SexParent prob SE df lower.CL upper.CL
Female 0.701 0.0506 593 0.601 0.800
Male 0.749 0.0418 593 0.667 0.832
Results are averaged over the levels of: FoodTreatment
Confidence level used: 0.95
$contrasts
contrast estimate SE df t.ratio p.value
Female - Male -0.0489 0.0483 593 -1.014 0.3112
Results are averaged over the levels of: FoodTreatment
That said, there was a glitch in the subset
code, so if using version 1.8.9 or earlier, you need to install emmeans from GitHub.
Hello! You are right. I just downloaded version 1.8.9-900001 from GitHub, and emmeans()
and emtrends()
work fine with the subset
argument you provided. I'll be closing this.
Thank you very much!