rvlenth/emmeans

extension of 'plot prediction intervals? #131'

lena-bauer opened this issue · 10 comments

Hello,
emmip() plots with CI=T and PI=T together overrides the type="response" for others than LM

poisson:
data(Salamanders, package="glmmTMB")
m1 <- glmmTMB(count ~ mined + (1|site),
zi=~mined,
family=poisson, data=Salamanders)

emmeans_m1<-emmeans(m1, ~ mined, type="response")

emmip(m1, ~mined, CI=T, type="response")
emmip(m1, ~mined, PI=T, type="response")
emmip(m1, ~mined, CI=T, PI=T, type="response")

binomial
data(cbpp, package="lme4")
bovine <- glmmTMB(cbind(incidence, size-incidence) ~ period + (1|herd),
family=binomial, data=cbpp)

emmeans_bovine<-emmeans(bovine, ~ period, type="response")

emmip(bovine, ~period, CI=T, type="response")
emmip(bovine, ~period, PI=T, type="response")
emmip(bovine, ~period, CI=T, PI=T, type="response")

I thought, that PI should be 'arround' the CI, so obviously CI and PI are plotted for different y-axis scales

Thank you

Hmmm, Unfortunately, there is more than one issue here.

For the first model, If I do emmip(emmeans_m1, ~ mined, CI = TRUE, PI = TRUE), I get
image

and numerically, these are the values that are plotted:

> emmip(emmeans_m1, ~ mined, CI = TRUE, PI = TRUE, plotit = FALSE)
 mined yvar    SE  df   LCL  UCL   LPL   UPL tvar xvar
 yes   1.09 0.254 Inf 0.692 1.72 0.146  8.17 1    yes 
 no    3.42 0.311 Inf 2.862 4.09 0.478 24.48 1    no  

Confidence level used: 0.95 
Intervals are back-transformed from the log scale 

These do correspond to what we see with predict(emmeans_m1, interval = "conf") and predict(emmeans_m1, interval = "pred"). But if I start from scratch with the model, I get what you experienced:

> emmip(m1, ~ mined, CI = TRUE, PI = TRUE, type = "response", plotit = FALSE)
 mined yvar    SE  df   LCL  UCL    LPL UPL tvar xvar
 yes   1.09 0.254 Inf 0.692 1.72 -1.925 2.1 1    yes 
 no    3.42 0.311 Inf 2.862 4.09 -0.738 3.2 1    no  

Confidence level used: 0.95 
Intervals are back-transformed from the log scale

So we have:

Issue 1: emmip works differently depending on whether 'object' is an emmGrid or a fitted model

But we also have, as is discussed below:

Issue 2: Prediction intervals are not computed correctly except for normal families

This is a huge gaffe on my part; I simply was not thinking broadly enough when I wrote that code. For starters, note this:

> sigma(m1)
[1] 1

This is sort of a "hail Mary" result (and really I think it should return NA). And an implication is that we used sigma = 1 to produce the PIs shown above, both the incorrect ones and the "correct" ones. All of those PIs are completely wrong!

For a GLM, there is no such thing as a residual SD. Even with a random effect in there, we do not have a sensible concept of residual SD for prediction purposes, because what we should be doing instead is some kind of 95% interval of the fitted Poisson distribution, and somehow widening that interval based on the variations in site.

To tell you the truth, I don't have an appetite for trying to make sensible prediction intervals based on GLMMs. To my knowledge (which is definitely limited), there is no literature on this except for Bayesian models where we can produce posterior predictive distributions. Look in the vignette index and you will find an example, and that is the approach I recommend if you really want to do predictions. My approach for frequentist GLMs and GLMMs will be to simply disable the ability to make prediction intervals for these models.

I anticipate that Issue 1 will be fairly simple to fix, but Issue 2 could be a mess. I'll keep you posted...

Ah, I just looked back at #131 and that's where we encountered possible bronie users. How could I forget that???

Ok, thank you.

OK, I think I have the plot-related and prediction-related issues solved.

Regarding predictions (Issue 2), I was able to shoehorn an appropriate remedy into the standard code that is used for managing link functions, and incidentally already has family information. that makes it possible to fairly reliably determine when PIs are appropriate.

> emm1 <- emmeans(m1, ~mined, type = "resp")
> predict(emm1, interval = "conf")
 mined rate    SE  df asymp.LCL asymp.UCL
 yes   1.09 0.254 Inf     0.692      1.72
 no    3.42 0.311 Inf     2.862      4.09

Confidence level used: 0.95 
Intervals are back-transformed from the log scale 

> predict(emm1, interval = "pred")
NULL
Warning message:
Prediction intervals are not available for this object

> emmip(m1, ~ mined, CIs = TRUE, PIs = TRUE)
Warning message:
Prediction intervals are not available for this object 
> # (but the resulting plot shows the CIs)

You can't see the fix for the other issue with this model because PIs are disabled, butIssue 1 has been fixed too:

> pigs.lm <- lm(inverse(conc) ~ source + factor(percent), data = pigs)
> emmip(pigs.lm, ~ source, CIs = TRUE, PIs = TRUE, type = "resp")

image

This works similarly with plot.emmGrid.

The care in checking for appropriateness of predictions also has a side effect for cases where we use bias.adj = TRUE, which requires a sigma value. Previously, it just blithely used sigma = sigma(model) which is almost always wrong with GLMs and GLMMs. Now it finds object@misc$sigma = NA so it can't do the bias correction unless you specify sigma. That is really as it should be, but I still need to work through that situation and issue appropriate warnings.

BTW, thanks for reporting this! On the brony thing, I had just been joking around with @mattansb about an earlier color palette choice that reminded me of My Little Pony books and products.

One of the best GitHub threads on #131 😂

Thank you very much for that fast reply, debugging and clarifcation.
By the way, I also like colors, but no pink and glitter :) And now, i know what a brony is. Learned something!

Some newly acquired knowledge is more useful than others...

I think this issue is resolved, so am closing it.