`simulateResiduals()` or `createData()` have changed in latest version?
Closed this issue · 2 comments
Something seems to have changes when simulating residuals. I now get other results in package version 0.4.7 as compared to the older package version:
packageVersion("DHARMa")
#> [1] '0.4.6'
set.seed(123)
dat <- DHARMa::createData(sampleSize = 100, overdispersion = 0.5, family = poisson())
m <- glm(observedResponse ~ Environment1, family = poisson(), data = dat)
res <- DHARMa::simulateResiduals(m)
head(res$scaledResiduals)
#> [1] 0.55348979 0.44011511 0.39826168 0.98249554 0.90753010 0.05809462
packageVersion("DHARMa")
#> [1] '0.4.7'
set.seed(123)
dat <- DHARMa::createData(sampleSize = 100, overdispersion = 0.5, family = poisson())
m <- glm(observedResponse ~ Environment1, family = poisson(), data = dat)
res <- DHARMa::simulateResiduals(m)
head(res$scaledResiduals)
#> [1] 0.01814574 0.70620144 0.56075987 0.19281890 0.31422678 0.10114779
It might be that the source of this change is located in createData()
:
packageVersion("DHARMa")
#> [1] '0.4.6'
set.seed(123)
dat <- DHARMa::createData(sampleSize = 100, overdispersion = 0.5, family = poisson())
head(dat)
#> ID observedResponse Environment1 group time x y
#> 1 1 1 -0.52254795 1 1 0.2875775 0.5999890
#> 2 2 3 0.92471787 1 2 0.7883051 0.3328235
#> 3 3 1 0.20273145 1 3 0.4089769 0.4886130
#> 4 4 5 0.03005945 1 4 0.8830174 0.9544738
#> 5 5 3 -0.19485332 1 5 0.9404673 0.4829024
#> 6 6 1 0.76049308 1 6 0.0455565 0.8903502
packageVersion("DHARMa")
#> [1] '0.4.7'
set.seed(123)
dat <- DHARMa::createData(sampleSize = 100, overdispersion = 0.5, family = poisson())
head(dat)
#> ID observedResponse Environment1 group time x y
#> 1 1 0 0.95570681 1 31 0.43943154 0.8034185
#> 2 2 1 -0.17052933 1 79 0.31170220 0.5468262
#> 3 3 0 -0.76119044 1 51 0.40947495 0.6623176
#> 4 4 0 0.05205932 1 14 0.01046711 0.1716985
#> 5 5 0 -0.54985330 1 67 0.18384952 0.6330554
#> 6 6 0 -0.02717647 1 42 0.84272932 0.3118697
Created on 2024-10-18 with reprex v2.1.1
OK, I think I know what happened - we introduced a piece of code that randomises the time variable to de-correlate it from the grouping variable. This happens BEFORE the random distribution is applied, so by applying the time sampling the random seed for the poisson will have changed in comparison to the previous version.
We could probably move the time sampling to the end, so that we have continuity in the random seed with previous DHARMa versions. I'm not sure though how serious of a problem this is? Is this just something that you were wondering about, or are you relying on the createData function to be reproducible with a seed across versions?
Best
Florian
Ok, just wanted to clarify what has changed. I think no need to change something on your side, I can just update the tests accordingly.