emmeans is giving oposite results for averaging than for each of the 8 models
Closed this issue · 11 comments
Hi Russell, I found an error diging more into the averaging issues, and I am still not sure if is a emmeans error or a MuMIn error, but I thought it would be good to notify you so this is the same dataset and analysis, again all data is on this repo this repo
first we load the needed r packages and the dataset
library(emmeans)
library(lme4)
library(MuMIn)
library(doParallel)
Data <- readRDS("Data.rds")
Now we fit a general model:
Model <- glmer(richness ~ aspect + elevation +
initial_habitat +
I(abs(year - 1)) +
I((year - 1)^2) +
slope +
treatment:initial_habitat +
year:initial_habitat +
year:treatment +
year:treatment:initial_habitat +
(1 | block_no), family = poisson, data = Data, control = glmerControl(optimizer = "bobyqa", optCtrl = list(maxfun = 100000)))
And we make a model selection (skip this step since it takes a while, the “SelectRichness.rds” file is in this github)
options(na.action = "na.fail")
library(doParallel)
cl <- makeCluster(4)
registerDoParallel(cl)
clusterEvalQ(cl, library(lme4))
clusterExport(cl, "Data")
Select <- MuMIn::pdredge(Model, extra = list(R2m = function(x) r.squaredGLMM(x)[1, 1], R2c = function(x) r.squaredGLMM(x)[1, 2]), fixed = ~ YEAR:Treatment, cluster = cl)
saveRDS(Select, "SelectRichness.rds")
And now we Select the best models, I will do this twice, since the outcome of the subset
function will be used to show how the best model does not have issues and the averaged model from get.models
which is the result I need is not working
Select <- readRDS("SelectRichness.rds")
SelectedList <- get.models(Select, delta < 2)
Lets see all models
So now, if we look at each model we get this
SelectedList <- get.models(Select, delta < 2)
# Calculate emmeans fore each of the 8 models
ListEMM <- SelectedList |>
purrr::map(~ emmeans(.x, ~ year * initial_habitat * treatment, at = list(year = 4), data = Data, type = "response"))
# Then get the pairs table for each model
TablePairs <- ListEMM |>
purrr::map(~ pairs(.x, simple = "treatment")) |>
purrr::map(as.data.frame)
## add a model name
for (i in 1:length(TablePairs)) {
TablePairs[[i]]$Model <- paste("Model", i)
}
# Join all tables and show it
TablePairs <- TablePairs |> purrr::reduce(rbind)
As you can see in this table we have significant values for all comparisons, where all ratios are bellow 1, pointing at PermanentExclosure having lower richness than Control
contrast | year | initial_habitat | ratio | SE | df | null | z.ratio | p.value | Model |
---|---|---|---|---|---|---|---|---|---|
PermanentExclosure / Control | 4 | Forest | 0.8030333 | 0.0422571 | Inf | 1 | -4.168592 | 3.06e-05 | Model 1 |
PermanentExclosure / Control | 4 | Meadow | 0.8030333 | 0.0422571 | Inf | 1 | -4.168592 | 3.06e-05 | Model 1 |
PermanentExclosure / Control | 4 | Rangeland | 0.8030333 | 0.0422571 | Inf | 1 | -4.168592 | 3.06e-05 | Model 1 |
PermanentExclosure / Control | 4 | Forest | 0.7964861 | 0.0420561 | Inf | 1 | -4.309411 | 1.64e-05 | Model 2 |
PermanentExclosure / Control | 4 | Meadow | 0.7964861 | 0.0420561 | Inf | 1 | -4.309411 | 1.64e-05 | Model 2 |
PermanentExclosure / Control | 4 | Rangeland | 0.7964861 | 0.0420561 | Inf | 1 | -4.309411 | 1.64e-05 | Model 2 |
PermanentExclosure / Control | 4 | Forest | 0.8031875 | 0.0421844 | Inf | 1 | -4.172918 | 3.01e-05 | Model 3 |
PermanentExclosure / Control | 4 | Meadow | 0.8031875 | 0.0421844 | Inf | 1 | -4.172918 | 3.01e-05 | Model 3 |
PermanentExclosure / Control | 4 | Rangeland | 0.8031875 | 0.0421844 | Inf | 1 | -4.172918 | 3.01e-05 | Model 3 |
PermanentExclosure / Control | 4 | Forest | 0.7966193 | 0.0419842 | Inf | 1 | -4.314336 | 1.60e-05 | Model 4 |
PermanentExclosure / Control | 4 | Meadow | 0.7966193 | 0.0419842 | Inf | 1 | -4.314336 | 1.60e-05 | Model 4 |
PermanentExclosure / Control | 4 | Rangeland | 0.7966193 | 0.0419842 | Inf | 1 | -4.314336 | 1.60e-05 | Model 4 |
PermanentExclosure / Control | 4 | Forest | 0.8006527 | 0.0422373 | Inf | 1 | -4.214460 | 2.50e-05 | Model 5 |
PermanentExclosure / Control | 4 | Meadow | 0.8006527 | 0.0422373 | Inf | 1 | -4.214460 | 2.50e-05 | Model 5 |
PermanentExclosure / Control | 4 | Rangeland | 0.8006527 | 0.0422373 | Inf | 1 | -4.214460 | 2.50e-05 | Model 5 |
PermanentExclosure / Control | 4 | Forest | 0.8039527 | 0.0423231 | Inf | 1 | -4.145121 | 3.40e-05 | Model 6 |
PermanentExclosure / Control | 4 | Meadow | 0.8039527 | 0.0423231 | Inf | 1 | -4.145121 | 3.40e-05 | Model 6 |
PermanentExclosure / Control | 4 | Rangeland | 0.8039527 | 0.0423231 | Inf | 1 | -4.145121 | 3.40e-05 | Model 6 |
PermanentExclosure / Control | 4 | Forest | 0.7939884 | 0.0420386 | Inf | 1 | -4.357007 | 1.32e-05 | Model 7 |
PermanentExclosure / Control | 4 | Meadow | 0.7939884 | 0.0420386 | Inf | 1 | -4.357007 | 1.32e-05 | Model 7 |
PermanentExclosure / Control | 4 | Rangeland | 0.7939884 | 0.0420386 | Inf | 1 | -4.357007 | 1.32e-05 | Model 7 |
PermanentExclosure / Control | 4 | Forest | 0.8008168 | 0.0421648 | Inf | 1 | -4.218685 | 2.46e-05 | Model 8 |
PermanentExclosure / Control | 4 | Meadow | 0.8008168 | 0.0421648 | Inf | 1 | -4.218685 | 2.46e-05 | Model 8 |
PermanentExclosure / Control | 4 | Rangeland | 0.8008168 | 0.0421648 | Inf | 1 | -4.218685 | 2.46e-05 | Model 8 |
Working with the average model gives oposite result
If we go with the model averaging, we see the opposite result
AV <- model.avg(SelectedList, fit = TRUE, random = TRUE)
noise.emm <- emmeans(AV, ~ year * initial_habitat * treatment, at = list(year = 4), data = Data, type = "response")
AverageTable <- pairs(noise.emm, simple = "treatment")
We can see that the table gives the opposite result, where all comparisons have a ration over 1 with PermanentExclosure having higher richness than control, but it is not significant.
knitr::kable(AverageTable)
contrast | year | initial_habitat | ratio | SE | df | null | t.ratio | p.value |
---|---|---|---|---|---|---|---|---|
PermanentExclosure / Control | 4 | Forest | 1.070482 | 0.2635936 | 233 | 1 | 0.2765966 | 0.7823351 |
PermanentExclosure / Control | 4 | Meadow | 1.070482 | 0.2635936 | 233 | 1 | 0.2765966 | 0.7823351 |
PermanentExclosure / Control | 4 | Rangeland | 1.070482 | 0.2635936 | 233 | 1 | 0.2765966 | 0.7823351 |
Standard output and standard error
`/home/au687614/Documents/emmeansProblem/Problem_std_out_err.txt`Session info
sessioninfo::session_info()
#;-) ─ Session info ───────────────────────────────────────────────────────────
#;-) setting value
#;-) version R version 4.2.2 Patched (2022-11-10 r83330)
#;-) os Ubuntu 20.04.5 LTS
#;-) system x86_64, linux-gnu
#;-) ui RStudio
#;-) language en_US:en
#;-) collate en_US.UTF-8
#;-) ctype en_US.UTF-8
#;-) tz Europe/Copenhagen
#;-) date 2023-03-13
#;-) rstudio 2022.07.2+576 Spotted Wakerobin (desktop)
#;-) pandoc 2.19.2 @ /usr/lib/rstudio/bin/quarto/bin/tools/ (via rmarkdown)
#;-)
#;-) ─ Packages ───────────────────────────────────────────────────────────────
#;-) package * version date (UTC) lib source
#;-) boot 1.3-28 2021-05-03 [4] CRAN (R 4.0.5)
#;-) bslib 0.4.2 2022-12-16 [1] CRAN (R 4.2.2)
#;-) cachem 1.0.7 2023-02-24 [3] CRAN (R 4.2.2)
#;-) cli 3.6.0 2023-01-09 [1] CRAN (R 4.2.2)
#;-) cluster 2.1.4 2022-08-22 [4] CRAN (R 4.2.1)
#;-) coda 0.19-4 2020-09-30 [3] CRAN (R 4.0.2)
#;-) codetools 0.2-19 2023-02-01 [4] CRAN (R 4.2.2)
#;-) digest 0.6.31 2022-12-11 [1] CRAN (R 4.2.2)
#;-) doParallel * 1.0.17 2022-02-07 [1] CRAN (R 4.2.1)
#;-) dplyr 1.1.0 2023-01-29 [1] CRAN (R 4.2.2)
#;-) emmeans * 1.8.4-90005 2023-03-07 [1] Github (rvlenth/emmeans@2b0273d)
#;-) estimability 1.4.1 2022-08-05 [1] CRAN (R 4.2.1)
#;-) evaluate 0.20 2023-01-17 [1] CRAN (R 4.2.2)
#;-) fansi 1.0.4 2023-01-22 [3] CRAN (R 4.2.2)
#;-) fastmap 1.1.1 2023-02-24 [3] CRAN (R 4.2.2)
#;-) foreach * 1.5.2 2022-02-02 [1] CRAN (R 4.2.1)
#;-) fs 1.6.1 2023-02-06 [3] CRAN (R 4.2.2)
#;-) generics 0.1.3 2022-07-05 [1] CRAN (R 4.2.1)
#;-) glue 1.6.2 2022-02-24 [3] CRAN (R 4.1.2)
#;-) htmltools 0.5.4 2022-12-07 [1] CRAN (R 4.2.2)
#;-) iterators * 1.0.14 2022-02-05 [1] CRAN (R 4.2.1)
#;-) janitor 2.1.0 2021-01-05 [1] CRAN (R 4.2.1)
#;-) jquerylib 0.1.4 2021-04-26 [1] CRAN (R 4.2.0)
#;-) jsonlite 1.8.4 2022-12-06 [1] CRAN (R 4.2.2)
#;-) knitr 1.42 2023-01-25 [1] CRAN (R 4.2.2)
#;-) lattice 0.20-45 2021-09-22 [4] CRAN (R 4.2.0)
#;-) lifecycle 1.0.3 2022-10-07 [3] CRAN (R 4.2.1)
#;-) lme4 * 1.1-31 2022-11-01 [1] CRAN (R 4.2.2)
#;-) lubridate 1.9.0 2022-11-06 [1] CRAN (R 4.2.2)
#;-) magrittr 2.0.3 2022-03-30 [3] CRAN (R 4.1.3)
#;-) MASS 7.3-58.2 2023-01-23 [4] CRAN (R 4.2.2)
#;-) Matrix * 1.5-3 2022-11-11 [4] CRAN (R 4.2.2)
#;-) mgcv 1.8-42 2023-03-02 [4] CRAN (R 4.2.2)
#;-) minqa 1.2.5 2022-10-19 [3] CRAN (R 4.2.1)
#;-) multcomp 1.4-20 2022-08-07 [1] CRAN (R 4.2.1)
#;-) MuMIn * 1.47.3 2023-02-28 [1] R-Forge (R 4.2.2)
#;-) mvtnorm 1.1-3 2021-10-08 [1] CRAN (R 4.2.1)
#;-) nlme 3.1-162 2023-01-31 [4] CRAN (R 4.2.2)
#;-) nloptr 2.0.3 2022-05-26 [3] CRAN (R 4.2.0)
#;-) permute 0.9-7 2022-01-27 [1] CRAN (R 4.2.0)
#;-) pillar 1.8.1 2022-08-19 [1] CRAN (R 4.2.1)
#;-) pkgconfig 2.0.3 2019-09-22 [3] CRAN (R 4.0.0)
#;-) purrr 1.0.1 2023-01-10 [1] CRAN (R 4.2.2)
#;-) R.cache 0.16.0 2022-07-21 [1] CRAN (R 4.2.1)
#;-) R.methodsS3 1.8.2 2022-06-13 [1] CRAN (R 4.2.1)
#;-) R.oo 1.25.0 2022-06-12 [1] CRAN (R 4.2.1)
#;-) R.utils 2.12.2 2022-11-11 [1] CRAN (R 4.2.2)
#;-) R6 2.5.1 2021-08-19 [3] CRAN (R 4.1.1)
#;-) Rcpp 1.0.10 2023-01-22 [3] CRAN (R 4.2.2)
#;-) reprex 2.0.2 2022-08-17 [1] CRAN (R 4.2.2)
#;-) rlang 1.0.6 2022-09-24 [1] CRAN (R 4.2.1)
#;-) rmarkdown 2.20 2023-01-19 [1] CRAN (R 4.2.2)
#;-) rsconnect 0.8.28 2022-10-24 [1] CRAN (R 4.2.2)
#;-) rstudioapi 0.14 2022-08-22 [1] CRAN (R 4.2.1)
#;-) sandwich 3.0-2 2022-06-15 [1] CRAN (R 4.2.1)
#;-) sass 0.4.5 2023-01-24 [1] CRAN (R 4.2.2)
#;-) sessioninfo 1.2.2 2021-12-06 [3] CRAN (R 4.1.2)
#;-) snakecase 0.11.0 2019-05-25 [1] CRAN (R 4.2.1)
#;-) stringi 1.7.12 2023-01-11 [1] CRAN (R 4.2.2)
#;-) stringr 1.5.0 2022-12-02 [1] CRAN (R 4.2.2)
#;-) styler 1.8.1 2022-11-07 [1] CRAN (R 4.2.2)
#;-) survival 3.5-3 2023-02-12 [4] CRAN (R 4.2.2)
#;-) TH.data 1.1-1 2022-04-26 [1] CRAN (R 4.2.1)
#;-) tibble 3.1.8 2022-07-22 [1] CRAN (R 4.2.1)
#;-) tidyselect 1.2.0 2022-10-10 [1] CRAN (R 4.2.1)
#;-) timechange 0.1.1 2022-11-04 [1] CRAN (R 4.2.2)
#;-) TMB 1.9.2 2023-01-23 [1] CRAN (R 4.2.2)
#;-) utf8 1.2.3 2023-01-31 [3] CRAN (R 4.2.2)
#;-) vctrs 0.5.2 2023-01-23 [1] CRAN (R 4.2.2)
#;-) vegan 2.6-4 2022-10-11 [1] CRAN (R 4.2.1)
#;-) withr 2.5.0 2022-03-03 [3] CRAN (R 4.1.3)
#;-) xfun 0.37 2023-01-31 [1] CRAN (R 4.2.2)
#;-) xtable 1.8-4 2019-04-21 [1] CRAN (R 4.2.0)
#;-) yaml 2.3.7 2023-01-23 [3] CRAN (R 4.2.2)
#;-) zoo 1.8-11 2022-09-17 [1] CRAN (R 4.2.1)
#;-)
#;-) [1] /home/au687614/R/x86_64-pc-linux-gnu-library/4.2
#;-) [2] /usr/local/lib/R/site-library
#;-) [3] /usr/lib/R/site-library
#;-) [4] /usr/lib/R/library
#;-)
#;-) ──────────────────────────────────────────────────────────────────────────
Before I look at this, is the same SelectRichness.rds
file as before? If so, I won't need to download it. If not, it should have a different name.
BTW, I am not asking for a re-do, but in future reporting of bugs to me or another person, it is best to avoid elaborating things as much as possible. Just straight calls please to the functions that are needed to reproduce the bug. Pre- and post-processing through pipes, dplyr()
, kable()
, etc. are just extra, unneeded steps and mask the actual objects produced by my package functions.
Hey Thank you Russell, I pushed the new Select.Richness.rds right now, sorry about the piping, I tried to keep it at a minimum this time, if there is a new issue I will avoid it entirely
NO!!!! If SelectRichness
has changed, please, please do not give it the same name. Because you have now probably made issue 402 non-reproducible.
This is what I am seeing based on the original SelectRichness
from #402 -- just using model predictions for a common grid:
> ref_grid(AV, data = Data)
'emmGrid' object with variables:
aspect = 125
elevation = 37.718
year = 1.9918
initial_habitat = Forest, Meadow, Rangeland
treatment = PermanentExclosure, Control
Transformation: “log”
> grid = transform(noise.emm@grid, aspect = 125, elevation = 37.718, block_no = NA)
> grid
year initial_habitat treatment .wgt. aspect elevation block_no
1 4 Forest PermanentExclosure 50 125 37.718 NA
2 4 Meadow PermanentExclosure 15 125 37.718 NA
3 4 Rangeland PermanentExclosure 70 125 37.718 NA
4 4 Forest Control 25 125 37.718 NA
5 4 Meadow Control 14 125 37.718 NA
6 4 Rangeland Control 70 125 37.718 NA
> preds = sapply(SelectedList, \(m) predict(m, newdata = grid, re.form = NA, type = "response"))
> preds
275 403 274 402 283 279 411 282
1 7.825350 8.49674 8.057187 8.73896 8.158278 7.764152 8.869441 8.399967
2 14.208545 14.67030 14.646058 15.14676 15.814345 14.649633 16.364545 16.301204
3 16.670437 16.13882 17.164182 16.61635 15.902183 16.644227 15.377029 16.373395
4 9.744739 10.66778 10.031513 10.97005 10.189530 9.657477 11.170740 10.489261
5 17.693594 18.41878 18.234917 19.01379 19.751807 18.222015 20.610552 20.355744
6 20.759335 20.26253 21.370080 20.85857 19.861515 20.703000 19.366812 20.445891
> (avgpred = apply(preds, 1, mean))
1 2 3 4 5 6
8.288759 15.225174 16.360828 10.365137 19.037650 20.453468
> predict(AV, newdata = grid, re.form = NA, type = "response")
1 2 3 4 5 6
8.252539 15.003734 16.452004 10.317070 18.755675 20.562313
> ### These are the same results ###
> fun = function(x) c(r1 = x[1]/x[4], r2 = x[2]/x[5], r3 = x[3]/x[6])
> apply(preds, 2, fun)
275 403 274 402 283 279 411 282
r1.1 0.8030333 0.796486 0.8031876 0.7966197 0.8006531 0.8039524 0.7939887 0.8008159
r2.2 0.8030333 0.796486 0.8031876 0.7966197 0.8006531 0.8039524 0.7939887 0.8008159
r3.3 0.8030333 0.796486 0.8031876 0.7966197 0.8006531 0.8039524 0.7939887 0.8008159
> fun(avgpred)
r1.1 r2.2 r3.3
0.7996768 0.7997402 0.7999049
On the other hand, if I use noise.emm
, I do get different results:
> predict(noise.emm)
[1] 11.03103 20.05298 22.00366 10.30473 18.73265 20.55491
> ### calculated "manually"
> as.numeric(exp(noise.emm@linfct %*% noise.emm@bhat))
[1] 11.03103 20.05298 22.00366 10.30473 18.73265 20.55491
> fun(predict(noise.emm))
r1 r2 r3
1.070482 1.070482 1.070482
> ### The linear functions and coefs look right:
> noise.emm@linfct
(Intercept) aspect elevation I((year - 1)^2) I(abs(year - 1)) initial_habitatMeadow initial_habitatRangeland initial_habitatForest:year initial_habitatMeadow:year
[1,] 1 125 37.71776 9 3 0 0 4 0
[2,] 1 125 37.71776 9 3 1 0 0 4
[3,] 1 125 37.71776 9 3 0 1 0 0
[4,] 1 125 37.71776 9 3 0 0 4 0
[5,] 1 125 37.71776 9 3 1 0 0 4
[6,] 1 125 37.71776 9 3 0 1 0 0
initial_habitatRangeland:year year:treatmentControl
[1,] 0 0
[2,] 0 0
[3,] 4 0
[4,] 0 4
[5,] 0 4
[6,] 4 4
> noise.emm@bhat
(Intercept) aspect elevation I((year - 1)^2) I(abs(year - 1)) initial_habitatMeadow
2.155205e+00 2.044456e-05 7.891242e-04 -1.824446e-02 1.649255e-01 6.357188e-01
initial_habitatRangeland initial_habitatForest:year initial_habitatMeadow:year initial_habitatRangeland:year treatmentControl:year
7.759404e-01 -2.934701e-02 -3.886043e-02 -5.070798e-02 -1.702733e-02
> coef(AV, full = TRUE)
(Intercept) I((year - 1)^2) I(abs(year - 1)) initial_habitatMeadow initial_habitatRangeland
2.155205e+00 -1.824446e-02 1.649255e-01 6.357188e-01 7.759404e-01
treatmentPermanentExclosure:year treatmentControl:year initial_habitatForest:year initial_habitatMeadow:year initial_habitatRangeland:year
-7.279999e-02 -1.702733e-02 -2.934701e-02 -3.886043e-02 -5.070798e-02
treatmentControl:year elevation aspect
-1.702733e-02 7.891242e-04 2.044456e-05
> length(noise.emm@bhat)
[1] 11
> length(coef(AV, full = TRUE))
[1] 13
> length(unique(names(coef(AV, full = TRUE))))
[1] 12
The question is why; and I'm not quite sure because everything seems right. except for the last bit, where the coefs are mismatched. One more thing:
> setdiff(names(coef(AV, full = TRUE)), names(noise.emm@bhat))
[1] "treatmentPermanentExclosure:year"
So apparently the issue is that the above coefficient is omitted in noise.emm
I'm not sure how to resolve this. And BTW I don't know what is supposed to be the effect of random = TRUE
in the call to model.avg
.
Just to clarify my previous comment. I'd say the gold standard for implementation of emm_basis
methods is that it produces predictions identical to those obtained via predict(model, newdata = ref_grid(model)@grid)
. My tests prove that the predictions obtained this way are identical to the average of the predictions from the individual models, and that this is different from predict(ref_grid(model))
, So the test fails.
Not knowing how to fix it does not mitigate that fact.
I'm going to take back some of what I said above. The gold standard I refer to must not be real gold, because it has some tarnish on it! Namely, if you follow what is done in my tests, we produced predictions from the separate models and averaged them together, and that process produced exactly the same results as predict(AV, newdata = grid, re.form = NA, type = "response")
. Read every detail of the previous call...
- It doesn't work without
re.form = NA
. That's because it needs that argument. That's a clue. - It says
type = "response"
. I should have noticed earlier that we averaged those predictions on the response scale, not on the link scale.
Because the link function is the log, which is nonlinear, it is impossible to develop a linear predictor for the link scale that will back-transform to exactly the rersults we had on the response scale. For that reason, it is clear there is something shady about predict.averaging()
. Looking at its code, we find that the returned value we obtained was obtained exactly the way I did in my tests: obtain the predictions for each model (passing the needed argument re.form = NA
and type = "response"
) and average them together.
So I think the so-called gold standard I mention is a sham, because there is no way those results could be obtained using the model coefficients. And it makes me wonder what we even mean by model averaging and whether the implementation of predictions in MuMIn is correct. Specifically, note that
> exp(predict(AV, newdata = grid, re.form=NA, type = "link"))
1 2 3 4 5 6
8.244221 14.986917 16.444795 10.304731 18.732655 20.554906
These are different results than obtained before, showing that predict.averaging()
is inconsistent with itself, assuming we really are talking about a single averaged model that has its own regression coefficients.
The code for predict.averaging
has a clause like this:
if ((missing(se.fit) || !se.fit) && (is.na(type) || type ==
"link") && !backtransform && all(linherits(models, c(gam = FALSE,
lm = TRUE))) && !anyNA(object$coefficients[1L, ])) {
... code that does it kind of like what emmeans does ...
}
I ran this in debug mode and the if
condition tests FALSE
but still I could run the code it skipped over. I obtained exactly the same model matrix as in the @linfct
slot of noise.emm
, and exactly the same estimates as was obtained in my tests above from emmeans()
.
So I don't know what to think, other than I'm less prone to believe I did it wrong.
... and not inclined to believe that "I can fix it."
OK, here's something to ponder...
> colnames(model.matrix(AV))
[1] "(Intercept)" "I((year - 1)^2)"
[3] "I(abs(year - 1))" "initial_habitatMeadow"
[5] "initial_habitatRangeland" "treatmentPermanentExclosure:year"
[7] "treatmentControl:year" "initial_habitatForest:year"
[9] "initial_habitatMeadow:year" "initial_habitatRangeland:year"
[11] "year:treatmentControl" "elevation"
[13] "aspect"
> colnames(model.matrix(terms(delete.response(formula(AV))), data = Data))
[1] "(Intercept)" "aspect"
[3] "elevation" "I((year - 1)^2)"
[5] "I(abs(year - 1))" "initial_habitatMeadow"
[7] "initial_habitatRangeland" "initial_habitatForest:year"
[9] "initial_habitatMeadow:year" "initial_habitatRangeland:year"
[11] "year:treatmentControl"
The first lists the names of the regression coefficients that can potentially appear in a model. The second is the column names of the model matrix generated by the model's terms
component. They don't match; and in particular, the latter excludes treatmentPermanentExclosure:year
. I think this happens because the model formula in the individual models lacks a main effect of treatment
. But in any case, the bottom line is that the models are parameterized differently, meaning that we are using the wrong coefficients.
By golly, I think I'm getting close:
> RG = ref_grid(AV, at = list(year = 4), data = Data, type = "response")
> (noise.emm = emmeans(RG, ~ year * initial_habitat * treatment))
year initial_habitat treatment rate SE df lower.CL upper.CL
4 Forest PermanentExclosure 8.24 1.25 233 6.11 11.1
4 Meadow PermanentExclosure 14.99 2.99 233 10.11 22.2
4 Rangeland PermanentExclosure 16.44 1.63 233 13.53 20.0
4 Forest Control 9.63 3.02 233 5.19 17.9
4 Meadow Control 17.50 5.57 233 9.34 32.8
4 Rangeland Control 19.20 4.67 233 11.90 31.0
Confidence level used: 0.95
Intervals are back-transformed from the log scale
> summary(pairs(noise.emm, simple = "treatment"), by = NULL)
Note: adjust = "tukey" was changed to "sidak"
because "tukey" is only appropriate for one set of pairwise comparisons
contrast year initial_habitat ratio SE df null t.ratio p.value
PermanentExclosure / Control 4 Forest 0.856 0.209 233 1 -0.634 0.8940
PermanentExclosure / Control 4 Meadow 0.856 0.209 233 1 -0.634 0.8940
PermanentExclosure / Control 4 Rangeland 0.856 0.209 233 1 -0.634 0.8940
P value adjustment: sidak method for 3 tests
Tests are performed on the log scale
What I did was, instead of creating a model matrix and matching coefficients to its columns, I did just the opposite by creating a model matrix column corresponding to the name of each regression coefficient -- including any duplicates.
In the above output, we find that the first 3 estimates are equal to the results of exp(predict(AV, newdata = grid, re.form=NA, type = "link"))
shown in a previous comment. The 4--6th estimates are somewhat smaller, so somehow the effect of treatmentControl
is coming out wrong. I'll see if I can figure it out.
OK, we've got it. Starting with the above,
n.emm = noise.emm
> n.emm@bhat[11] = 0 # zero out the duplicated coefficient
> n.emm
year initial_habitat treatment rate SE df lower.CL upper.CL
4 Forest PermanentExclosure 8.24 1.25 233 6.11 11.1
4 Meadow PermanentExclosure 14.99 2.99 233 10.11 22.2
4 Rangeland PermanentExclosure 16.44 1.63 233 13.53 20.0
4 Forest Control 10.30 3.23 233 5.55 19.1
4 Meadow Control 18.73 5.97 233 10.00 35.1
4 Rangeland Control 20.55 4.99 233 12.74 33.2
Confidence level used: 0.95
Intervals are back-transformed from the log scale
> summary(pairs(n.emm, simple = "treatment"), by = NULL)
Note: adjust = "tukey" was changed to "sidak"
because "tukey" is only appropriate for one set of pairwise comparisons
contrast year initial_habitat ratio SE df null t.ratio p.value
PermanentExclosure / Control 4 Forest 0.8 0.196 233 1 -0.913 0.7407
PermanentExclosure / Control 4 Meadow 0.8 0.196 233 1 -0.913 0.7407
PermanentExclosure / Control 4 Rangeland 0.8 0.196 233 1 -0.913 0.7407
P value adjustment: sidak method for 3 tests
Tests are performed on the log scale
Now all 6 estimates match those using the predict
function. So the key is simply to omit the duplicates.
I have done this and pushed up the revised package. If you re-install emmeans from GitHub, it should work right. This issue was a tough one! Thanks for reporting it and thanks for your patience.
Hey Russel this is amazing you really did it!! thanks for all your work