Interpreting DHARMa diagnostics for a binomial GAM
Closed this issue · 5 comments
Hi Florian,
I hope this message finds you well. My name is Katia, and I am currently working on my master's thesis at Universitat de Girona (Girona, Catalonia, Spain). I have read the very useful vignette you provided about the DHARMa packages. However, there are still a few points that I'm not certain about and I wish I could ask you directly for advice.
I built a generalized additive model (GAM) to analyze species occurrence as a function of habitat fragmentation, habitat amount and other spatial covariates. The model is as follows:
Modelo_gam <- gam(
occurrence ~ frag + s(hab_amount) + s(X, Y) + s(SITES, bs = "re"),
family = binomial,
data = data2,
method = "REML")
In this study, "occurrence" represents the binary response variable indicating the presence or absence of the species. The primary predictor is "frag," which represents habitat fragmentation. The model also includes smooth terms for "hab_amount,"representing the amount of habitat, and for the spatial coordinates "X" and "Y," which were incorporated to account for spatial autocorrelation. Additionally, the model incorporates a random effect for "SITES," which is specified as a smooth term with a random effect basis (bs = "re"). The main purpose of the study is to assess whether habitat fragmentation and habitat amount influence primate species occurrence within patches of forest.
I ran diagnostics on the model using the DHARMa package:
I just wonder if these warnings are severe enough to suggest that this model should not be used. I have an N = 978 landscapes. The Q-Q plot looks fine, but the highly significant KS test and the red lines in the residual plots are really concerning to me.
Thank you so much for your time. I look forward to your responses.
Best wishes,
Katia R
Hi Katia,
The significance of the quantiles and KS uniformity test is probably due to the large sample size. As it is written in the DHARMa vignette (Interpreting residuals and recognizing misspecification problems):
Once a residual effect is statistically significant, look at the magnitude to decide if there is a problem: it is crucial to note that significance is NOT a measure of the strength of the residual pattern, it is a measure of the signal/noise ratio, i.e. whether you are sure there is a pattern at all. Significance in hypothesis tests depends on at least 2 ingredients: the strength of the signal and the number of data points. If you have a lot of data points, residual diagnostics will nearly inevitably become significant, because having a perfectly fitting model is very unlikely. That, however, doesn’t necessarily mean that you need to change your model. The p-values confirm that there is a deviation from your null hypothesis. It is, however, in your discretion to decide whether this deviation is worth worrying about. For example, if you see a dispersion parameter of 1.01, I would not worry, even if the dispersion test is significant. A significant value of 5, however, is clearly a reason to move to a model that accounts for overdispersion.
So, I suggest you take a look at the magnitude of the statistics first.
You could also try different tests and plots in DHARMa, such as spatial autocorrelation in the residuals or the plot of the scaled (quantile) residuals x a specific predictor.
Best,
Melina
Hello Katia,
to add to what Melina said: although significant, the magnitude of the deviation from expectations is very small, as all plots look nearly like they should. So, based on the plots you showed, I don't see any reason to change something about your model.
However, as Melina suggested, as you seem to have spatial data a test for autocorrelation would make sense, plotting against all predictors would make sense, and potentially grouping spatial would also make sense, as suggested in https://cran.r-project.org/web/packages/DHARMa/vignettes/DHARMa.html#binomial-data
Best
Florian
ps.: although given you fit s(X, Y), it's very unlikely that you have a residual spatial pattern.
Hi Katia,
yes, very likely s(X, Y) will have absorbed any strong spatial pattern that you have.
Again, as I said, unless you find anything in the res ~ predictors and possibly when grouping residuals, I don't see a problem with the model.
I'll close this, as this seems to be resolved.
Best
F