
Minimum density can be zero

Opened this issue · 1 comments

It seems like with large, skewed datasets the density estimate for a point can be exactly zero. This doesn't make sense to me, since all the points should represent some data. It also presents a technical issue if I, say, wanted to log-transform the color scale.

df <- data.frame(x = c(rep(0, 100000), rnorm(100000)),
                 y = c(rep(0, 100000), rnorm(100000)))
p <- ggplot(df, aes(x = x, y = y)) +
#> geom_pointdensity using method='kde2d' due to large number of points (>20k)

p + scale_color_continuous(trans = "log10")
#> geom_pointdensity using method='kde2d' due to large number of points (>20k)
#> Warning: Transformation introduced infinite values in discrete y-axis

Created on 2024-02-08 with reprex v2.0.2

This is maybe related to the default bandwidth estimator used by MASS::k2de(). If I supply my own values of h using a different bandwidth estimator (e.g. bw.nrd0()) I don't have this issue or the issue with bandwith == 0 (#21). Even the documentation says that bw.nrd() "has remained the default for historical and compatibility reasons, rather than as a general recommendation". Perhaps it would be better for stat_pointdensity() to calculate its own bandwidth rather than relying on the defaults for k2de()