Points at Infinity are Missing

Question

Points at Infinity are Missing

const-ae opened this issue 5 years ago · 10 comments

Hi Lukas,

I have a feature suggestion: I just noticed, that geom_pointdensity() does not plot the points that have an x or y value of ±Inf, unlike geom_point() which puts them all down at the border of the plot:

library(ggplot2)
df <- data.frame(x = rnorm(200),
           y = c(rnorm(100), rep(-Inf, 100)))

ggplot(df, aes(x=x, y = y)) +
    geom_point()

ggplot(df, aes(x=x, y = y)) +
  ggpointdensity::geom_pointdensity()
#> Warning: Removed 100 rows containing non-finite values (stat_pointdensity).

^{Created on 2020-01-27 by the reprex package (v0.3.0)}

This feature would be quite useful, because it could help to see how dense the overplotting at the bottom is.

Best, Constantin

Answer 1 · 2020-02-03T11:17:54.000Z

Hi Constantin,

interesting find! I guess from a mathematical point of view it's impossible to assign a density to those points 😅
Do you want to calculate the density in 1D instead, or what do you suggest?

Answer 2 · 2020-02-10T08:38:26.000Z

Yes, I think the most reasonable behaviour would be to calculate the density as if the points with Inf were at .Machine$double.xmax = 1.79e+308.
That way the density for the finite points wouldn't change, but could still see the density at the border of the plot.

Answer 3 · 2020-02-21T11:08:50.000Z

@lysogeny added the points at infinity. They lack a density estimate for now.
@const-ae I have to test if your suggestion works. I'm worried that replacing infinite value with huge numbers could cause e.g. a float overflow in the C code cause these numbers will be squared in the density calculation.

Answer 4 · 2020-06-15T14:43:37.000Z

Hey Lukas,

I just came across the problem again and gave it a try myself, building on the PR of lysogeny.

library(ggplot2)

df <- data.frame(x = rnorm(20001),
                 y = c(rnorm(20001 - 100), rep(-Inf, 100)))

ggplot(df, aes(x=x, y = y)) +
  geom_point()

ggplot(df, aes(x=x, y = y)) +
  ggpointdensity::geom_pointdensity(method = "kde2d")

ggplot(df, aes(x=x, y = y)) +
  ggpointdensity::geom_pointdensity(method = "default")

^{Created on 2020-06-15 by the reprex package (v0.3.0)}

I modified the C count_neighbors() function to treat the distance on the axis that is infinite as 0. I think this is reasonable, because that is how they appear on the plot.

I didn't however modify the KDE2D density estimation, which means that the results can differ as you can see in the reprex.

Answer 5 · 2020-06-15T16:02:57.000Z

Oh, I realize that I might have made a mistake. I of course only want to allow the distance if both points are infinite position.

Answer 6 · 2020-06-15T19:01:30.000Z

Okay, I updated the PR (#14).

The plots look now better as well:

library(ggplot2)

df <- data.frame(x = rnorm(20001),
                 y = c(rnorm(20001 - 100), rep(-Inf, 100)))

ggplot(df, aes(x=x, y = y)) +
  geom_point()

ggplot(df, aes(x=x, y = y)) +
  ggpointdensity::geom_pointdensity(method = "kde2d")

ggplot(df, aes(x=x, y = y)) +
  ggpointdensity::geom_pointdensity(method = "default")

^{Created on 2020-06-15 by the reprex package (v0.3.0)}

Answer 7 · 2020-08-19T08:29:39.000Z

Thanks for fixing this @const-ae . The only issue is that method="kde2d" and method="default" handle infinite values differently now (i.e. kde2d doesn't calculate the density for those values at all). At some point I want to fix this inconsistency, but for now I'm closing this.

Answer 8 · 2020-08-19T08:34:57.000Z

Thanks for merging the PR. I see that this is not ideal that the methods differ. I just took a look how kde2d actually works and the implementation https://github.com/cran/MASS/blob/c2ff394b1c45d58ebe72811699c683a3ca59e097/R/kde2d.R doesn't seem super complicated, so I guess it might be possible to write your own version of kde2d that can handle infinity :)

Answer 9 · 2021-09-30T21:52:16.000Z

I've installed the package from github per the instructions on the README and updated the suggested packages, however, I still get

Warning message: Removed 49 rows containing non-finite values (stat_pointdensity).

Am I missing something?

ps. great package, BTW.

Answer 10 · 2021-10-01T02:45:44.000Z

To follow up on my previous post, "Yes, I'm missing something." Restarting R fixed my problem.