extension of 'plot prediction intervals? #131'
lena-bauer opened this issue · 10 comments
Hello,
emmip() plots with CI=T and PI=T together overrides the type="response" for others than LM
poisson:
data(Salamanders, package="glmmTMB")
m1 <- glmmTMB(count ~ mined + (1|site),
zi=~mined,
family=poisson, data=Salamanders)
emmeans_m1<-emmeans(m1, ~ mined, type="response")
emmip(m1, ~mined, CI=T, type="response")
emmip(m1, ~mined, PI=T, type="response")
emmip(m1, ~mined, CI=T, PI=T, type="response")
binomial
data(cbpp, package="lme4")
bovine <- glmmTMB(cbind(incidence, size-incidence) ~ period + (1|herd),
family=binomial, data=cbpp)
emmeans_bovine<-emmeans(bovine, ~ period, type="response")
emmip(bovine, ~period, CI=T, type="response")
emmip(bovine, ~period, PI=T, type="response")
emmip(bovine, ~period, CI=T, PI=T, type="response")
I thought, that PI should be 'arround' the CI, so obviously CI and PI are plotted for different y-axis scales
Thank you
Hmmm, Unfortunately, there is more than one issue here.
For the first model, If I do emmip(emmeans_m1, ~ mined, CI = TRUE, PI = TRUE)
, I get
and numerically, these are the values that are plotted:
> emmip(emmeans_m1, ~ mined, CI = TRUE, PI = TRUE, plotit = FALSE)
mined yvar SE df LCL UCL LPL UPL tvar xvar
yes 1.09 0.254 Inf 0.692 1.72 0.146 8.17 1 yes
no 3.42 0.311 Inf 2.862 4.09 0.478 24.48 1 no
Confidence level used: 0.95
Intervals are back-transformed from the log scale
These do correspond to what we see with predict(emmeans_m1, interval = "conf")
and predict(emmeans_m1, interval = "pred")
. But if I start from scratch with the model, I get what you experienced:
> emmip(m1, ~ mined, CI = TRUE, PI = TRUE, type = "response", plotit = FALSE)
mined yvar SE df LCL UCL LPL UPL tvar xvar
yes 1.09 0.254 Inf 0.692 1.72 -1.925 2.1 1 yes
no 3.42 0.311 Inf 2.862 4.09 -0.738 3.2 1 no
Confidence level used: 0.95
Intervals are back-transformed from the log scale
So we have:
Issue 1: emmip works differently depending on whether 'object' is an emmGrid or a fitted model
But we also have, as is discussed below:
Issue 2: Prediction intervals are not computed correctly except for normal families
This is a huge gaffe on my part; I simply was not thinking broadly enough when I wrote that code. For starters, note this:
> sigma(m1)
[1] 1
This is sort of a "hail Mary" result (and really I think it should return NA
). And an implication is that we used sigma = 1
to produce the PIs shown above, both the incorrect ones and the "correct" ones. All of those PIs are completely wrong!
For a GLM, there is no such thing as a residual SD. Even with a random effect in there, we do not have a sensible concept of residual SD for prediction purposes, because what we should be doing instead is some kind of 95% interval of the fitted Poisson distribution, and somehow widening that interval based on the variations in site
.
To tell you the truth, I don't have an appetite for trying to make sensible prediction intervals based on GLMMs. To my knowledge (which is definitely limited), there is no literature on this except for Bayesian models where we can produce posterior predictive distributions. Look in the vignette index and you will find an example, and that is the approach I recommend if you really want to do predictions. My approach for frequentist GLMs and GLMMs will be to simply disable the ability to make prediction intervals for these models.
I anticipate that Issue 1 will be fairly simple to fix, but Issue 2 could be a mess. I'll keep you posted...
Ah, I just looked back at #131 and that's where we encountered possible bronie users. How could I forget that???
Ok, thank you.
OK, I think I have the plot-related and prediction-related issues solved.
Regarding predictions (Issue 2), I was able to shoehorn an appropriate remedy into the standard code that is used for managing link functions, and incidentally already has family information. that makes it possible to fairly reliably determine when PIs are appropriate.
> emm1 <- emmeans(m1, ~mined, type = "resp")
> predict(emm1, interval = "conf")
mined rate SE df asymp.LCL asymp.UCL
yes 1.09 0.254 Inf 0.692 1.72
no 3.42 0.311 Inf 2.862 4.09
Confidence level used: 0.95
Intervals are back-transformed from the log scale
> predict(emm1, interval = "pred")
NULL
Warning message:
Prediction intervals are not available for this object
> emmip(m1, ~ mined, CIs = TRUE, PIs = TRUE)
Warning message:
Prediction intervals are not available for this object
> # (but the resulting plot shows the CIs)
You can't see the fix for the other issue with this model because PIs are disabled, butIssue 1 has been fixed too:
> pigs.lm <- lm(inverse(conc) ~ source + factor(percent), data = pigs)
> emmip(pigs.lm, ~ source, CIs = TRUE, PIs = TRUE, type = "resp")
This works similarly with plot.emmGrid
.
The care in checking for appropriateness of predictions also has a side effect for cases where we use bias.adj = TRUE
, which requires a sigma
value. Previously, it just blithely used sigma = sigma(model)
which is almost always wrong with GLMs and GLMMs. Now it finds object@misc$sigma = NA
so it can't do the bias correction unless you specify sigma
. That is really as it should be, but I still need to work through that situation and issue appropriate warnings.
BTW, thanks for reporting this! On the brony thing, I had just been joking around with @mattansb about an earlier color palette choice that reminded me of My Little Pony books and products.
Thank you very much for that fast reply, debugging and clarifcation.
By the way, I also like colors, but no pink and glitter :) And now, i know what a brony is. Learned something!
Some newly acquired knowledge is more useful than others...
I think this issue is resolved, so am closing it.