LKremer/ggpointdensity

Minimum density can be zero

Opened this issue · 1 comments

It seems like with large, skewed datasets the density estimate for a point can be exactly zero. This doesn't make sense to me, since all the points should represent some data. It also presents a technical issue if I, say, wanted to log-transform the color scale.

library(ggplot2)
library(ggpointdensity)
df <- data.frame(x = c(rep(0, 100000), rnorm(100000)),
                 y = c(rep(0, 100000), rnorm(100000)))
p <- ggplot(df, aes(x = x, y = y)) +
  geom_pointdensity()
p
#> geom_pointdensity using method='kde2d' due to large number of points (>20k)

p + scale_color_continuous(trans = "log10")
#> geom_pointdensity using method='kde2d' due to large number of points (>20k)
#> Warning: Transformation introduced infinite values in discrete y-axis

Created on 2024-02-08 with reprex v2.0.2

Session info
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.3.1 (2023-06-16)
#>  os       macOS Sonoma 14.2.1
#>  system   x86_64, darwin20
#>  ui       X11
#>  language (EN)
#>  collate  en_US.UTF-8
#>  ctype    en_US.UTF-8
#>  tz       America/Phoenix
#>  date     2024-02-08
#>  pandoc   3.1.1 @ /Applications/RStudio.app/Contents/Resources/app/quarto/bin/tools/ (via rmarkdown)
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package        * version    date (UTC) lib source
#>  cli              3.6.2      2023-12-11 [1] CRAN (R 4.3.0)
#>  colorspace       2.1-0      2023-01-23 [1] CRAN (R 4.3.0)
#>  curl             5.2.0      2023-12-08 [1] CRAN (R 4.3.0)
#>  digest           0.6.34     2024-01-11 [1] CRAN (R 4.3.0)
#>  dplyr            1.1.4      2023-11-17 [1] CRAN (R 4.3.0)
#>  evaluate         0.23       2023-11-01 [1] CRAN (R 4.3.0)
#>  fansi            1.0.6      2023-12-08 [1] CRAN (R 4.3.0)
#>  farver           2.1.1      2022-07-06 [1] CRAN (R 4.3.0)
#>  fastmap          1.1.1      2023-02-24 [1] CRAN (R 4.3.0)
#>  fs               1.6.3      2023-07-20 [1] CRAN (R 4.3.0)
#>  generics         0.1.3      2022-07-05 [1] CRAN (R 4.3.0)
#>  ggplot2        * 3.4.4      2023-10-12 [1] CRAN (R 4.3.0)
#>  ggpointdensity * 0.1.0      2024-02-01 [1] Github (LKremer/ggpointdensity@02f3ab2)
#>  glue             1.7.0      2024-01-09 [1] CRAN (R 4.3.0)
#>  gtable           0.3.4      2023-08-21 [1] CRAN (R 4.3.0)
#>  highr            0.10       2022-12-22 [1] CRAN (R 4.3.0)
#>  htmltools        0.5.7      2023-11-03 [1] CRAN (R 4.3.0)
#>  knitr            1.45       2023-10-30 [1] CRAN (R 4.3.0)
#>  labeling         0.4.3      2023-08-29 [1] CRAN (R 4.3.0)
#>  lifecycle        1.0.4      2023-11-07 [1] CRAN (R 4.3.0)
#>  magrittr         2.0.3      2022-03-30 [1] CRAN (R 4.3.0)
#>  MASS             7.3-60.0.1 2024-01-13 [1] CRAN (R 4.3.0)
#>  munsell          0.5.0      2018-06-12 [1] CRAN (R 4.3.0)
#>  pillar           1.9.0      2023-03-22 [1] CRAN (R 4.3.0)
#>  pkgconfig        2.0.3      2019-09-22 [1] CRAN (R 4.3.0)
#>  purrr            1.0.2      2023-08-10 [1] CRAN (R 4.3.0)
#>  R.cache          0.16.0     2022-07-21 [1] CRAN (R 4.3.0)
#>  R.methodsS3      1.8.2      2022-06-13 [1] CRAN (R 4.3.0)
#>  R.oo             1.25.0     2022-06-12 [1] CRAN (R 4.3.0)
#>  R.utils          2.12.2     2022-11-11 [1] CRAN (R 4.3.0)
#>  R6               2.5.1      2021-08-19 [1] CRAN (R 4.3.0)
#>  reprex           2.0.2      2022-08-17 [1] CRAN (R 4.3.0)
#>  rlang            1.1.3      2024-01-10 [1] CRAN (R 4.3.0)
#>  rmarkdown        2.25       2023-09-18 [1] CRAN (R 4.3.0)
#>  rstudioapi       0.15.0     2023-07-07 [1] CRAN (R 4.3.0)
#>  scales           1.3.0      2023-11-28 [1] CRAN (R 4.3.1)
#>  sessioninfo      1.2.2      2021-12-06 [1] CRAN (R 4.3.0)
#>  styler           1.10.2     2023-08-29 [1] CRAN (R 4.3.0)
#>  tibble           3.2.1      2023-03-20 [1] CRAN (R 4.3.0)
#>  tidyselect       1.2.0      2022-10-10 [1] CRAN (R 4.3.0)
#>  utf8             1.2.4      2023-10-22 [1] CRAN (R 4.3.0)
#>  vctrs            0.6.5      2023-12-01 [1] CRAN (R 4.3.0)
#>  withr            3.0.0      2024-01-16 [1] CRAN (R 4.3.0)
#>  xfun             0.41       2023-11-01 [1] CRAN (R 4.3.0)
#>  xml2             1.3.5      2023-07-06 [1] CRAN (R 4.3.0)
#>  yaml             2.3.8      2023-12-11 [1] CRAN (R 4.3.0)
#> 
#>  [1] /Users/ericscott/Library/R/x86_64/4.3/library
#>  [2] /Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/library
#> 
#> ──────────────────────────────────────────────────────────────────────────────

This is maybe related to the default bandwidth estimator used by MASS::k2de(). If I supply my own values of h using a different bandwidth estimator (e.g. bw.nrd0()) I don't have this issue or the issue with bandwith == 0 (#21). Even the documentation says that bw.nrd() "has remained the default for historical and compatibility reasons, rather than as a general recommendation". Perhaps it would be better for stat_pointdensity() to calculate its own bandwidth rather than relying on the defaults for k2de()