LKremer/ggpointdensity

Normalized density for facetted plots

seasmith opened this issue · 4 comments

Would it make sense to create a computed statistic to show the normalized/relative number of neighbors per group to the max nearest neighbors?

# i.e.
data$r_neighbors <- data$n_neighbors / max(data$n_neighbors)

Yes it would make sense in some cases! I wouldn't want to make this the default behavior because I think raw neighbor counts are a bit more intuitive than relative ones, but for facetted plots I see how it can be useful. Maybe something like geom_pointdensity(relative=TRUE) would be worthwhile?

Returning a computed stat would bring the function's behavior inline with other ggplot2 functions (i.e. stat_density_2d returns both density, ndensity, level, and nlevel).

# Example
library(ggplot2)
library(ggpointdensity)

ggplot(diamonds, aes(carat, price)) +
  geom_pointdensity(aes(color = stat(r_neighbors)))

I feel density and ndensity are more inline with ggplot2 and would make the function more extendible (i.e. if the function accepted something like method = "kde2d" for 2d kernel-density or method = "bkde2d" for 2d binned kernel-density).

# Example

# Default method would be 'nn'
ggplot(diamonds, aes(carat, price)) +
  geom_pointdensity(aes(color = stat(ndensity)), method = "nn")

# kernel-density
ggplot(diamonds, aes(carat, price)) +
  geom_pointdensity(aes(color = stat(ndensity)), method = "bkde2d")

# binned kernel-density
ggplot(diamonds, aes(carat, price)) +
  geom_pointdensity(aes(color = stat(ndensity)), method = "bkde2d")

Returning a computed stat would bring the function's behavior inline with other ggplot2 functions (i.e. stat_density_2d returns both density, ndensity, level, and nlevel).

This is already the case. stat_pointdensity computes a stat called n_neighbors.
I just realized you can even use this stat to plot the density as you originally proposed:

ggplot(dat, aes(x = x, y = y, color = stat(n_neighbors) / max(n_neighbors))) +
    geom_pointdensity() +
    scale_color_viridis()

I could tweak the stat_pointdensity to return both n_neighbors and the density for convenience.

Regarding your last suggestion with method = "something", I'm experimenting with something like this at the moment. Mostly to test out different algorithms to find an efficient one that can handle many points (issue #2).

This was implemented in @bjreisman's recent pull request #8 , so I'm closing.