florianhartig/DHARMa

Pattern in RE-grouped residual for binomial GLMM

florianhartig opened this issue · 2 comments

Question via email

I'm using the DHARMa package to check the residuals of my binomial (0/1) generalized additive models for my master thesis project, and I'm having some difficulties interpreting the results. When I run the main plots, I obtain these results:

image

I followed what is suggested in the manual and tried to group the residuals. I have 4 natural grouping variables: 2 RE and 2 temporal (year and month), and my model include all these 4 variables and some other continuous predictors. If I understood it correctly, when I group (e.g by one of the RE) the plots indicate some underdispersion? My main doubts is related with the plot on the right (residuals vs. predictions). What does it indicate about the model? Is these residual pattern concerning?

image

Hello,

this doesn't look super concerning to me, but it does confirm the point in the vignette that you can have perfectly fine patterns per data point, but once you look per group you see certain patterns.

Before interpreting the pattern, I wanted to check that this is not a fluke, because RE estimates are known to be biased towards the mean, so was considering if a pattern could emerge from that. I therefore played around with variations of the following code:

testData = createData(sampleSize = 500, 
                      overdispersion = 0, 
                      family = binomial(), 
                      randomEffectVariance = 2, 
                      numGroups = 100)

library(lme4)
fit <- glmer(observedResponse ~ Environment1 + (1|group), 
             family = "binomial", data = testData)

summary(fit)


res <- simulateResiduals(fit, plot = T)
res2 <- recalculateResiduals(res, group = testData$group)
plot(res2)

I didn't really see any spurious patterns emerge. Thus, I would conclude that this is real and that you have a bit of underdispersion and a bit of heteroskedasticity in your RE-grouped residuals.

Underdispersion is often as sign of overfitting, so it could be that you have quite a lot of REs in relation to your number of observations, and thus your REs overfit a bit. As underdispersion is usually conservative (i.e. your p-values are larger than they should be), I would not be concerned about this, i.e. I would think the model is probably still acceptable, but you could consider.

Feel free to comment on this. I'll leave this issue open as a reminder to check in a bit more detail if it's a good idea to group by RE, and to add comments on this in the vignette!

Thank you so much for your answer!