DHARMa implementation for new `check_residuals()` function
mccarthy-m-g opened this issue · 5 comments
Before writing any code for the new check_residuals()
function proposed in #376, I wanted to discuss the implementation we want to go with. There are two ways to go about the implementation, depending on how deeply we want to support model checks based on DHARMa’s simulated residuals in performance.
For context, below I first cover DHARMa’s basic workflow, then discuss two possible options for supporting DHARMa in performance.
The big picture questions are:
- Which model classes do we want to support?
- Do we want the simulated residuals to follow a uniform or normal distribution?
- Do we want DHARMa’s simulated residuals to be supported throughout performance, or just in the new
check_residuals()
function?- If we only support DHARMa’s simulated residuals in the new
check_residuals()
function then the implementation is pretty straightforward - If we support DHARMa’s simulated residuals throughout performance then we’ll need to add S3 methods for objects of class
DHARMa
to variouscheck_()
functions (e.g., autocorrelation, zero-inflation, etc.), so the implementation will be a little more involved
- If we only support DHARMa’s simulated residuals in the new
Basic DHARMa workflow
Simulated residuals
library(DHARMa)
#> This is DHARMa 0.4.6. For overview type '?DHARMa'. For recent changes, type news(package = 'DHARMa')
library(glmmTMB)
The basic workflow in DHARMa involves fitting a model, then simulating residuals from the fitted model.
model <- glmmTMB(
count ~ mined + spp + (1 | site),
family = poisson,
data = Salamanders
)
# Note that this function sets a seed by default, so we don't need to set one
# ourselves.
simulated_residuals <- simulateResiduals(model)
simulated_residuals
#> Object of Class DHARMa with simulated residuals based on 250 simulations with refit = FALSE . See ?DHARMa::simulateResiduals for help.
#>
#> Scaled residual values: 0.1555024 0.734551 0.4477616 0.5050258 0.4289974 0.2123359 0.2858764 0.5904198 0.8035344 0.3373841 0.5761363 0.1596766 0.2655446 0.04189079 0.147868 0.2487158 0.5633855 0.364279 0.4414209 0.4806934 ...
Regarding the fitted model, DHARMa currently supports the following classes:
Currently supported are linear and generalized linear (mixed) models from ‘lme4’ (classes ‘lmerMod’, ‘glmerMod’), ‘glmmTMB’, ‘GLMMadaptive’ and ‘spaMM’, generalized additive models (‘gam’ from ‘mgcv’), ‘glm’ (including ‘negbin’ from ‘MASS’, but excluding quasi-distributions) and ‘lm’ model classes
The main interest in the performance::check_residuals()
function is to add better checks for GL(M)Ms, so do we want to add methods for all these classes, or exclude some (e.g., lm
)?
The simulated_residuals
object is of class DHARMa
and is structured like so. Note that the fitted model is stored as the first item in the list here, so the simulated_residuals
object could be passed to other methods in performance that are based on a model object as input if methods are added to support the DHARMa
class.
str(simulated_residuals, max.level = 1)
#> List of 17
#> $ fittedModel :List of 7
#> ..- attr(*, "class")= chr "glmmTMB"
#> $ modelClass : chr "glmmTMB"
#> $ additionalParameters : list()
#> $ nObs : int 644
#> $ nSim : num 250
#> $ refit : logi FALSE
#> $ observedResponse : int [1:644] 0 0 0 2 2 1 1 2 4 1 ...
#> $ integerResponse : logi TRUE
#> $ problems : list()
#> $ fittedPredictedResponse: num [1:644] 0.197 0.197 0.197 1.896 1.896 ...
#> $ fittedFixedEffects : Named num [1:8] -1.625 2.264 -1.386 0.231 -0.77 ...
#> ..- attr(*, "names")= chr [1:8] "(Intercept)" "minedno" "sppPR" "sppDM" ...
#> $ fittedResiduals : Named num [1:644] -0.2139 -0.7867 -0.0828 -0.2951 -0.2951 ...
#> ..- attr(*, "names")= chr [1:644] "1" "2" "3" "4" ...
#> $ method : chr [1:2] "PIT" "traditional"
#> $ simulatedResponse : num [1:644, 1:250] 0 0 0 0 3 1 2 2 1 1 ...
#> ..- attr(*, "dimnames")=List of 2
#> $ scaledResiduals : num [1:644] 0.156 0.735 0.448 0.505 0.429 ...
#> $ time : 'proc_time' Named num [1:5] 0.149 0.009 0.159 0 0
#> ..- attr(*, "names")= chr [1:5] "user.self" "sys.self" "elapsed" "user.child" ...
#> $ randomState :List of 4
#> - attr(*, "class")= chr "DHARMa"
To get the residuals stored in simulated_residuals
use residuals()
.
simulated_residuals |>
residuals() |>
head(20)
#> [1] 0.15550238 0.73455098 0.44776164 0.50502581 0.42899743 0.21233592
#> [7] 0.28587644 0.59041980 0.80353437 0.33738415 0.57613629 0.15967657
#> [13] 0.26554460 0.04189079 0.14786802 0.24871584 0.56338550 0.36427899
#> [19] 0.44142089 0.48069339
By default the simulated residuals have a uniform distribution, but the scaling can be transformed to any other distribution, such as the normal distribution. However, in the case of the normal distribution, outliers will be mapped to -Inf / Inf by default (and if you want to get around this you need to decide which lower and upper values to convert -Inf / Inf to). From the DHARMa vignette
These normal residuals will behave exactly like the residuals of a linear regression. However, for reasons of a) numeric stability with low number of simulations, which makes it neccessary to descide on which value outliers are to be transformed and b) my conviction that it is much easier to visually detect deviations from uniformity than normality, DHARMa checks all residuals in the uniform space, and I would personally advice against using the transformation.
# Note the Inf values here:
simulated_residuals |>
residuals(quantileFunction = qnorm) |>
tail(30)
#> [1] 1.08007805 0.10778534 0.09064290 -0.25809108 Inf -2.00160575
#> [7] Inf -1.09151823 0.79929750 0.57712362 -0.97183120 -0.22312363
#> [13] -0.25864418 1.31949100 -1.43237801 1.07378421 -0.29403004 -0.20597054
#> [19] 1.15364152 2.09225482 -0.02295151 -0.80227605 -0.31481034 0.20750053
#> [25] -1.12027563 -0.97722105 1.01980584 1.85693880 -1.51072945 Inf
# Convert -Inf / Inf to -7 / 7:
simulated_residuals |>
residuals(quantileFunction = qnorm, outlierValues = c(-7, 7)) |>
tail(30)
#> [1] 1.08007805 0.10778534 0.09064290 -0.25809108 7.00000000 -2.00160575
#> [7] 7.00000000 -1.09151823 0.79929750 0.57712362 -0.97183120 -0.22312363
#> [13] -0.25864418 1.31949100 -1.43237801 1.07378421 -0.29403004 -0.20597054
#> [19] 1.15364152 2.09225482 -0.02295151 -0.80227605 -0.31481034 0.20750053
#> [25] -1.12027563 -0.97722105 1.01980584 1.85693880 -1.51072945 7.00000000
Given this outlier mapping issue, I think it’s probably best to stick with the uniform distribution unless there’s an opinionated way to automatically decide the range.
The main reason I brought this up though was that if transformed to normal, then we can use standard residual checks that assume normal residuals instead of the tests shipped with DHARMa; so some of the existing methods in performance could be reused here (but the sticky issue of defining the range for outliers would have to be dealt with).
Residual tests
DHARMa provides a number of specialized goodness-of-fit tests on the simulated residuals:
testUniformity()
: tests if the overall distribution conforms to expectationstestOutliers()
: tests if there are more simulation outliers than expectedtestDispersion()
: tests if the simulated dispersion is equal to the observed dispersiontestQuantiles()
: fits a quantile regression or residuals against a predictor (default predicted value), and tests of this conforms to the expected quantiletestCategorical(simulationOutput, catPred = testData$group)
tests residuals against a categorical predictortestZeroinflation
(): tests if there are more zeros in the data than expected from the simulationstestGeneric
(): test if a generic summary statistics (user-defined) deviates from model expectationstestTemporalAutocorrelation
(): tests for temporal autocorrelation in the residualstestSpatialAutocorrelation
(): tests for spatial autocorrelation in the residuals. Can also be used with a generic distance function, for example to test for phylogenetic signal in the residuals
The proposed performance::check_residuals()
function could either use DHARMa::testUniformity()
under the hood:
uniformity <- testUniformity(simulated_residuals, plot = FALSE)
class(uniformity)
#> [1] "ks.test" "htest"
uniformity
#>
#> Asymptotic one-sample Kolmogorov-Smirnov test
#>
#> data: simulationOutput$scaledResiduals
#> D = 0.086753, p-value = 0.0001234
#> alternative hypothesis: two-sided
Or just use the Kolmogorov-Smirnov test directly, since DHARMa::testUniformity()
is basically just a wrapper function with an additional argument for calling a plot as well.
ks.test(residuals(simulated_residuals), "punif")
#> Warning in ks.test.default(residuals(simulated_residuals), "punif"): ties should
#> not be present for the Kolmogorov-Smirnov test
#>
#> Asymptotic one-sample Kolmogorov-Smirnov test
#>
#> data: residuals(simulated_residuals)
#> D = 0.086753, p-value = 0.0001234
#> alternative hypothesis: two-sided
Basic DHARMa implementation
A basic implementation of check_residuals()
(ignoring the S3 methods for now) could do something like this, where the residuals are simulated, the test is run and printed, then a list containing the test results and residuals is returned.
check_residuals <- function(x) {
simulated_residuals <- residuals(simulateResiduals(model))
test_results <- suppressWarnings(ks.test(simulated_residuals, "punif"))
# This would be printed nicely in the actual implementation, but you get the
# idea.
print(test_results)
# We need to store the residuals here so they can be passed to the plot()
# method that will be added to `see` later.
invisible(list(test = test_results, residuals = simulated_residuals))
}
check_residuals(model)
#>
#> Asymptotic one-sample Kolmogorov-Smirnov test
#>
#> data: simulated_residuals
#> D = 0.086753, p-value = 0.0001234
#> alternative hypothesis: two-sided
Involved DHARMa implementation
A more involved implementation would involve us adding a simulate_residuals()
function that’s a wrapper of DHARMa::simulateResiduals()
. We would give this function S3 methods for the different model classes we want to support.
This object could then be passed to the various check_()
functions we want to support. For example:
check_residuals.DHARMa()
check_autocorrelation.DHARMa()
check_overdispersion.DHARMa()
- etc.
If we went with this approach, we’d advocate for a workflow like the basic DHARMa workflow demonstrated above. Since the simulated_residuals()
object also contains the fitted model, this could also be easily passed onto check_model()
.
Created on 2023-06-17 with reprex v2.0.2
A more involved implementation would involve us adding a
simulate_residuals()
function that’s a wrapper ofDHARMa::simulateResiduals()
. We would give this function S3 methods for the different model classes we want to support.This object could then be passed to the various
check_()
functions we want to support. For example:
check_residuals.DHARMa()
check_autocorrelation.DHARMa()
check_overdispersion.DHARMa()
- etc.
If we went with this approach, we’d advocate for a workflow like the basic DHARMa workflow demonstrated above. Since the
simulated_residuals()
object also contains the fitted model, this could also be easily passed ontocheck_model()
.
This sounds good to me. Basic steps would be:
- Writing a
simulate_residuals()
method, with.default
method and for different other classes. The returned object would be of class (e.g.)performance_sim_res
or something. - Adding methods for this class to the various performance functions, like
check_residuals.performance_sim_res()
etc., and adding plot/print methods.
simulate_residuals()
would be a wrapper around the DHARMa function, as you suggested.
I'm not that familiar with the DHARMa package - can we use those functions also for simple models, meaning that we could replace all base R residuals()
functions with DHARMa functions? Pinging @florianhartig, maybe you could also clarify some points?
I'm not that familiar with the DHARMa package - can we use those functions also for simple models, meaning that we could replace all base R
residuals()
functions with DHARMa functions? Pinging florianhartig, maybe you could also clarify some points?
From my original comment, DHARMa currently supports the following classes:
- ‘lme4’ (classes ‘lmerMod’, ‘glmerMod’)
- ‘glmmTMB’
- ‘GLMMadaptive’
- ‘spaMM’
- ‘gam’ from ‘mgcv’)
- ‘glm’ (including ‘negbin’ from ‘MASS’, but excluding quasi-distributions)
- ‘lm’ model classes
So yes, some simple models are also supported.
Personally I don't think replacing the existing support for any model with DHARMa functions is a great idea though, since the definition of DHARMa's residuals is different than that of standard residuals. From the vignette:
DHARMa aims at solving these problems by creating readily interpretable residuals for generalized linear (mixed) models that are standardized to values between 0 and 1, and that can be interpreted as intuitively as residuals for the linear model. This is achieved by a simulation-based approach, similar to the Bayesian p-value or the parametric bootstrap, that transforms the residuals to a standardized scale. The basic steps are:
- Simulate new response data from the fitted model for each observation.
- For each observation, calculate the empirical cumulative density function for the simulated observations, which describes the possible values (and their probability) at the predictor combination of the observed value, assuming the fitted model is correct.
- The residual is then defined as the value of the empirical density function at the value of the observed data, so a residual of 0 means that all simulated values are larger than the observed value, and a residual of 0.5 means half of the simulated values are larger than the observed value.
Plus since they're simulated, it would probably change the results in a lot of existing code, which isn't great; plus the additional computing time.
Instead, I think supporting simple models in the same way we'll support GL(M)Ms with the new performance::simulate_residuals()
function is the way to go, so people can opt into this workflow for simple models if they want it.
Thanks for your interested in coupling to DHARMa - I'm happy to provide help / advise where needed. A few thoughts regarding your questions / discussions:
-
The main problem that is solved by the randomised quantile residuals (RQR) as implemented in DHARMa is to homogenise non-normal and in particular discrete distributions such as the Poisson, binomial etc. I don't know of a better solution for this problem and would argue that all other options, e.g. binning for binomial, are inferior. Thus, I would recommend that the main decision criterion to switch to RQRs is if you have normal vs. any other GLM distribution. Of course, then the question is if it makes sense to stay with the standard residuals for LMs, but that's up to you! Note that for nSim -> infinity, simulation effects vanish, so DHARMa and raw residuals are identical for an LM (up to the transformation, see next point)
-
After having switched to RQRs, you can transform back to normal or keep uniform for the visualisation. I don't see an advantage of transforming for tests (e.g. KS is rank-based anyway), but there are pros and cons of different visualisation options. Related to discussions I have had with a few people, I am planning to add alternative, possibly normal visualisations to DHARMa. In reference to the discussion in #376, I would say that I personally am quite happy with the res ~ predicted plot in DHARMa and I would encourage you see how it performs in comparison to alternatives in practical data analyses problems . As case studies, see e.g. https://theoreticalecology.github.io/AdvancedRegressionModels/6C-CaseStudies.html#hurricanes or https://theoreticalecology.github.io/AdvancedRegressionModels/6C-CaseStudies.html#owls . I am very interested in the question of which residual definitions make most sense to users, so if you have good reasons to prefer another visualisation, I'd be very happy to consider those - ideally, tell me via https://github.com/florianhartig/DHARMa/issues
-
Somewhat contrary to what I read out of the discussion above, I would say RQRs don't really offer larger conceptual advantages for dealing with REs. Thus, for me the decision to move to RQRs or DHARMa specifically is mainly a question of the distribution than of the presence of REs in the model. However, there are specific things for mixed models that are better solved in DHARMa. In particular, all tests in DHARMa use simulation-based null distributions, meaning they don't require df of the model, which bypasses the df problem in mixed models. For example, overdispersion tests based on Pearson Chi-2 have a bias towards underdispersion for GLMMs because of the df problem. The DHARMa default test is based on a simulation-based null distribution, and thus is perfectly calibrated.
-
Finally, I am a bit concerned that by wrapping DHARMa in performance, you may hide help functions and options that are essential to understand what's going on and work properly with the respective functions. Note that as DHARMa is agnostic about the model that is tested (I don't check model classes, I only interact with the model via simulate), all defaults are set on the most conservative option that works for all models. In many cases, the help advises to use different settings in particular situations or for particular models. This can probably not be fully solved, but I would encourage you to include options in your wrappers that allow forwarding DHARMa options via
...
as far as possible and link prominently to the help files in DHARMa so that users can find those options and also remarks on the interpretation of the plots / tests.
Happy to dive deeper in this discussion if you find it useful. I just received a grant for hiring a programmer that is supposed to continue the development and also interoperability of DHARMa, see https://twitter.com/florianhartig/status/1704810170428121538. Once this position is filled (I guess some time next year), it would be possible devote a limited amount of her/his time on solving possible interoperability issues. Btw, feel free to forward this information to interested candidates.
Best
Florian