More reliable mean() / variance() for dist_default
mjskay opened this issue · 2 comments
Currently for distributions without an implementation of mean()
, mean.dist_default()
takes the mean of 1000 draws from the distribution:
Lines 78 to 80 in 40830ed
Naturally this causes some sampling error:
> mean(dist_wrap("lnorm", 0, 1))
[1] 1.76567
> mean(dist_wrap("lnorm", 0, 1))
[1] 1.587978
> mean(dist_wrap("lnorm", 0, 1))
[1] 1.646674
> mean(dist_wrap("lnorm", 0, 1))
[1] 1.706244
I feel like there should be a less noisy alternative. One such might be to take the mean of a large number of quantiles, a sort of quasi-monte carlo approach:
mean.dist_default <- function(x, ...){
mean(quantile(x, stats::ppoints(1000)), na.rm = TRUE)
}
This yields decent results on a sanity check:
> mean(dist_wrap("lnorm", 0, 1))
[1] 1.645156 # should be exp(0.5)
> exp(0.5)
[1] 1.648721
The approximation might also be improved by using integrate()
on continuous distributions, though I'm not sure how reliable this would be across arbitrary distributions:
mean.dist_default <- function(x, ...){
if (is.double(support(x)[[1]])) {
limits = quantile(x, c(0, 1))
tryCatch(
integrate(function(y) density(x, y) * y, limits[1], limits[2])$value,
error = function(e) NA_real_
)
} else {
mean(quantile(x, stats::ppoints(1000)), na.rm = TRUE)
}
}
Yeilding a result quite close to the correct result:
> mean(dist_wrap("lnorm", 0, 1))
[1] 1.648721
And something similar could probably also be used to improve the calculations in variance.dist_default()
.
Added, thanks!
Lovely, thank you!