Points at Infinity are Missing
const-ae opened this issue · 10 comments
Hi Lukas,
I have a feature suggestion: I just noticed, that geom_pointdensity()
does not plot the points that have an x or y value of ±Inf
, unlike geom_point()
which puts them all down at the border of the plot:
library(ggplot2)
df <- data.frame(x = rnorm(200),
y = c(rnorm(100), rep(-Inf, 100)))
ggplot(df, aes(x=x, y = y)) +
geom_point()
ggplot(df, aes(x=x, y = y)) +
ggpointdensity::geom_pointdensity()
#> Warning: Removed 100 rows containing non-finite values (stat_pointdensity).
Created on 2020-01-27 by the reprex package (v0.3.0)
This feature would be quite useful, because it could help to see how dense the overplotting at the bottom is.
Best, Constantin
Hi Constantin,
interesting find! I guess from a mathematical point of view it's impossible to assign a density to those points 😅
Do you want to calculate the density in 1D instead, or what do you suggest?
Yes, I think the most reasonable behaviour would be to calculate the density as if the points with Inf
were at .Machine$double.xmax = 1.79e+308
.
That way the density for the finite points wouldn't change, but could still see the density at the border of the plot.
@lysogeny added the points at infinity. They lack a density estimate for now.
@const-ae I have to test if your suggestion works. I'm worried that replacing infinite value with huge numbers could cause e.g. a float overflow in the C code cause these numbers will be squared in the density calculation.
Hey Lukas,
I just came across the problem again and gave it a try myself, building on the PR of lysogeny.
library(ggplot2)
df <- data.frame(x = rnorm(20001),
y = c(rnorm(20001 - 100), rep(-Inf, 100)))
ggplot(df, aes(x=x, y = y)) +
geom_point()
ggplot(df, aes(x=x, y = y)) +
ggpointdensity::geom_pointdensity(method = "kde2d")
ggplot(df, aes(x=x, y = y)) +
ggpointdensity::geom_pointdensity(method = "default")
Created on 2020-06-15 by the reprex package (v0.3.0)
I modified the C count_neighbors()
function to treat the distance on the axis that is infinite as 0. I think this is reasonable, because that is how they appear on the plot.
I didn't however modify the KDE2D density estimation, which means that the results can differ as you can see in the reprex.
Oh, I realize that I might have made a mistake. I of course only want to allow the distance if both points are infinite position.
Okay, I updated the PR (#14).
The plots look now better as well:
library(ggplot2)
df <- data.frame(x = rnorm(20001),
y = c(rnorm(20001 - 100), rep(-Inf, 100)))
ggplot(df, aes(x=x, y = y)) +
geom_point()
ggplot(df, aes(x=x, y = y)) +
ggpointdensity::geom_pointdensity(method = "kde2d")
ggplot(df, aes(x=x, y = y)) +
ggpointdensity::geom_pointdensity(method = "default")
Created on 2020-06-15 by the reprex package (v0.3.0)
Thanks for fixing this @const-ae . The only issue is that method="kde2d" and method="default" handle infinite values differently now (i.e. kde2d doesn't calculate the density for those values at all). At some point I want to fix this inconsistency, but for now I'm closing this.
Thanks for merging the PR. I see that this is not ideal that the methods differ. I just took a look how kde2d
actually works and the implementation https://github.com/cran/MASS/blob/c2ff394b1c45d58ebe72811699c683a3ca59e097/R/kde2d.R doesn't seem super complicated, so I guess it might be possible to write your own version of kde2d
that can handle infinity :)
I've installed the package from github per the instructions on the README and updated the suggested packages, however, I still get
Warning message: Removed 49 rows containing non-finite values (stat_pointdensity).
Am I missing something?
ps. great package, BTW.
To follow up on my previous post, "Yes, I'm missing something." Restarting R fixed my problem.