r-spatial/sf

point created by intersecting two linestrings does not intersect them

DOSull opened this issue · 13 comments

Describe the bug
A point I generate by intersecting two linestrings fails expected st_intersects and does not yield expected st_relate pattern.

To Reproduce

sd <- 0.01
lines <- st_multilinestring(list(matrix(c(-1, 1, -1, 1) + 
                                           rnorm(4, 0, sd), 2, 2), 
                                  matrix(c(-1, 1, 1, -1) + 
                                           rnorm(4, 0, sd), 2, 2))) |> 
  st_sfc() |> st_cast("LINESTRING")
pt <- st_intersection(lines[1], lines[2]) |> st_sfc()

st_intersects(pt, lines)
Sparse geometry binary predicate list of length 1, where the predicate was `intersects'
 1: (empty)

st_relate(pt, lines)
     [,1]        [,2]       
[1,] "FF0FFF102" "FF0FFF102"

So the point resulting from intersecting two lines does not intersect them, and is not related to them as expected ("0FFFFF102"). Inspection of the objects shows that the point is at the intersection of the two lines close to (0, 0).

> lines
Geometry set for 2 features 
Geometry type: LINESTRING
Dimension:     XY
Bounding box:  xmin: -0.9997257 ymin: -1.020536 xmax: 1.012496 ymax: 0.9998001
CRS:           NA
LINESTRING (-0.9929678 -0.9947806, 1.012496 0.9...
LINESTRING (-0.9997257 0.9998001, 0.9876878 -1....
> pt
Geometry set for 1 feature 
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 0.00154847 ymin: -0.01806063 xmax: 0.00154847 ymax: -0.01806063
CRS:           NA
POINT (0.00154847 -0.01806063)

Additional context
If I make lines running from (-1 -1) to (1 1) and (-1 1) to (1 -1) exactly (by setting sd <- 0) then they intersect at (0 0) and that point satisfies expected spatial predicates against the generating intersecting lines, i.e. the st_relate pattern is "0FFFFF102".

I have experimented with setting precision on the geometries but have only obtained expected results reliably when precision is set very crudely. If the problem is deemed 'upstream' in GEOS, as I suspect it might be (geos::geos_relate produces the same results on these geometries as sf) what might be a viable workaround in sf?

I initially encountered this problem working with lwgeom::st_split() trying to split lines by their intersection points, and getting many fewer output linestrings than expected. On looking more closely this was what I ran into.

> sessioninfo::session_info() ─ Session info ────────────────────────────────────────────────────────── setting value version R version 4.3.3 (2024-02-29) os macOS Sonoma 14.5 system aarch64, darwin20 ui RStudio language (EN) collate en_US.UTF-8 ctype en_US.UTF-8 tz Pacific/Auckland date 2024-07-01 rstudio 2024.04.1+748 Chocolate Cosmos (desktop) pandoc NA

─ Packages ───────────────────────────────────────────────────────────
package * version date (UTC) lib source
class 7.3-22 2023-05-03 [1] CRAN (R 4.3.3)
classInt 0.4-10 2023-09-05 [1] CRAN (R 4.3.0)
cli 3.6.2 2023-12-11 [1] CRAN (R 4.3.1)
codetools 0.2-19 2023-02-01 [1] CRAN (R 4.3.3)
DBI 1.2.2 2024-02-16 [1] CRAN (R 4.3.1)
e1071 1.7-14 2023-12-06 [1] CRAN (R 4.3.1)
KernSmooth 2.23-22 2023-07-10 [1] CRAN (R 4.3.3)
lattice 0.22-5 2023-10-24 [1] CRAN (R 4.3.3)
lwgeom * 0.2-14 2024-02-21 [1] CRAN (R 4.3.1)
magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.3.0)
Matrix 1.6-5 2024-01-11 [1] CRAN (R 4.3.3)
proxy 0.4-27 2022-06-09 [1] CRAN (R 4.3.0)
raster 3.6-26 2023-10-14 [1] CRAN (R 4.3.1)
Rcpp 1.0.12 2024-01-09 [1] CRAN (R 4.3.1)
rstudioapi 0.15.0 2023-07-07 [1] CRAN (R 4.3.0)
sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.3.0)
sf * 1.0-16 2024-03-24 [1] CRAN (R 4.3.1)
simplextree 1.0.1 2020-09-12 [1] CRAN (R 4.3.0)
sp 2.1-4 2024-04-30 [1] CRAN (R 4.3.1)
terra 1.7-71 2024-01-31 [1] CRAN (R 4.3.1)
units 0.8-5 2023-11-28 [1] CRAN (R 4.3.1)

[1] /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library

─────────────────────────────────────────────────────────────────────────────────────────────────────────────────

sf_extSoftVersion()
GEOS GDAL proj.4 GDAL_with_GEOS USE_PROJ_H PROJ
"3.11.0" "3.5.3" "9.1.0" "true" "true" "9.1.0"

Please add set.seed to any example usong the RNG. What precision settings do you prefer to apply?

Point taken about set.seed - although in this case, part of the point is the variability. When I set precision I usually go with 1e8 as that's a number that I think you've said in the past has some previous currency! However, in this example I have to set it much more crudely than that st_set_precision(1)! for the outcome to be consistently what I would hope/expect.

I know that line intersection is notoriously challenging, but that seems rather extreme!

1e8 in this context is the inverse, so:

> formatC(1/1e8, format="f", digits=8)
[1] "0.00000001"

and so on. Precision means converting the coordinates from double precision to an integer grid by multiplying by say 1e8 and checking for integer equality on that grid. sf by default uses floating point double precision rather than the integer grid (which was the default in defunct rgeos). So any jitter greater than the chosen precision will lead to the point being seen as not equal.

> set.seed(1)
> sd <- 0.01
> lines <- st_multilinestring(list(matrix(c(-1, 1, -1, 1) + 
+                                            rnorm(4, 0, sd), 2, 2), 
+                                   matrix(c(-1, 1, 1, -1) + 
+                                            rnorm(4, 0, sd), 2, 2))) |> 
+   st_sfc() |> st_cast("LINESTRING")
> pt <- st_intersection(lines[1], lines[2]) |> st_sfc()
> 
> st_intersects(pt, lines)
Sparse geometry binary predicate list of length 1, where the predicate
was `intersects'
 1: (empty)
> st_precision(lines) <- 1e8
> pt <- st_intersection(lines[1], lines[2]) |> st_sfc()
> st_intersects(pt, lines)
Sparse geometry binary predicate list of length 1, where the predicate
was `intersects'
 1: (empty)
> st_precision(lines) <- 1e2
> pt <- st_intersection(lines[1], lines[2]) |> st_sfc()
> st_intersects(pt, lines)
Sparse geometry binary predicate list of length 1, where the predicate
was `intersects'
 1: 1, 2

I know that precision is set in the opposite sense to how one might expect i.e., using a 'scale' multiplier, per the JTS documentation at https://locationtech.github.io/jts/javadoc/org/locationtech/jts/geom/PrecisionModel.html linked from the st_as_binary help. I made that mistake a while back in a different issue #1855!

But the above is not what I get...

> set.seed(1)
> sd <- 0.1
> lines <- st_multilinestring(list(matrix(c(-1, 1, -1, 1) + 
+                                            rnorm(4, 0, sd), 2, 2), 
+                                   matrix(c(-1, 1, 1, -1) + 
+                                            rnorm(4, 0, sd), 2, 2))) |> 
+   st_sfc() |> st_cast("LINESTRING")
> 
> st_precision(lines) <- 1e8
> pt <- st_intersection(lines[1], lines[2]) |> st_sfc()
> st_intersects(pt, lines)
Sparse geometry binary predicate list of length 1, where the predicate was `intersects'
 1: (empty)
> 
> st_precision(lines) <- 1e2
> pt <- st_intersection(lines[1], lines[2]) 
> st_intersects(pt, lines)
Sparse geometry binary predicate list of length 1, where the predicate was `intersects'
 1: (empty)

In any case... I'm not jittering the point, I'm changing the ends of the lines, and they are intersecting to yield the point, which by definition ought therefore to intersect the lines that generated it? It also seems wrong that to get this result we need a grid as crude as 0.01 resolution (and on my machine I have to go to 0.1, i.e. st_precision(...) <- 10, and even then I frequently get 'misses'!

> set.seed(2)
> sd <- 0.1
> lines <- st_multilinestring(list(matrix(c(-1, 1, -1, 1) +
+                                            rnorm(4, 0, sd), 2, 2), 
+                                   matrix(c(-1, 1, 1, -1) +
+                                            rnorm(4, 0, sd), 2, 2))) |> 
+   st_sfc() |> st_cast("LINESTRING")
> 
> st_precision(lines) <- 1e8
> pt <- st_intersection(lines[1], lines[2]) |> st_sfc()
> st_intersects(pt, lines)
Sparse geometry binary predicate list of length 1, where the predicate was `intersects'
 1: (empty)
> 
> st_precision(lines) <- 1e1
> pt <- st_intersection(lines[1], lines[2]) 
> st_intersects(pt, lines)
Sparse geometry binary predicate list of length 1, where the predicate was `intersects'
 1: (empty)

Also worth noting how apparent the poor resolution is when these objects are plotted. The plot below is for the objects generated with set.seed(2) and st_precision(lines) <- 1e1 on my machine:

> plot(lines)
> plot(pt, add = TRUE)

image

For what it's worth geos::geos_intersects yields the same incorrect result on my machine.

> geos::geos_intersects(pt, lines)
[1] FALSE FALSE

Is there some way in which this could be machine-dependent? MacOS ARM architecture here.

How was sf installed, CRAN binary or otherwise?

Trying on an aarch64 with R 4.4.1 and sf 1.0-16, I get your results, so a double-precision 64-bit/extended precision 80-bit difference most likely. My GEOS on Intel (Fedora 40) is newer too, but not much. What you are "jittering" are the endpoints of the lines, so indeed something is going on here. @paleolimbot any thoughts?

I would have installed from CRAN. I'm glad to hear that I am probably not missing something obvious!

Also... this:

> st_distance(lines, pt)
            [,1]
[1,] 8.21824e-17
[2,] 0.00000e+00

where I get some variation on 0 running many times with different random line end offsets, but the intersections still come up empty.

It looks as though the lines would need to be noded at the intersection:

> set.seed(1)
> lines <- st_multilinestring(list(matrix(c(-1, 1, -1, 1) + 
+                                            rnorm(4, 0, sd), 2, 2), 
+                                   matrix(c(-1, 1, 1, -1) + 
+                                            rnorm(4, 0, sd), 2, 2))) |> 
+   st_sfc() |> st_cast("LINESTRING")
> pt <- st_intersection(lines[1], lines[2]) |> st_sfc()
> st_intersects(pt, lines)
Sparse geometry binary predicate list of length 1, where the predicate
was `intersects'
 1: (empty)
> st_intersects(pt, st_union(lines))
Sparse geometry binary predicate list of length 1, where the predicate
was `intersects'
 1: 1
> st_coordinates(lines)
              X          Y L1
[1,] -1.0062645 -1.0083563  1
[2,]  1.0018364  1.0159528  1
[3,] -0.9967049  1.0048743  2
[4,]  0.9917953 -0.9926168  2
> st_coordinates(st_union(lines))
                X            Y L1 L2
[1,] -1.006264538 -1.008356286  1  1
[2,] -0.001176252  0.004844437  1  1
[3,] -0.001176252  0.004844437  2  1
[4,]  1.001836433  1.015952808  2  1
[5,] -0.996704922  1.004874291  3  1
[6,] -0.001176252  0.004844437  3  1
[7,] -0.001176252  0.004844437  4  1
[8,]  0.991795316 -0.992616753  4  1
> st_coordinates(pt)
                X           Y
[1,] -0.001176252 0.004844437
> st_length(st_nearest_points(pt, lines))
[1] 1.612109e-16 6.930893e-17
> st_length(st_nearest_points(pt, st_union(lines)))
[1] 0

See also libgeos/geos#585:

> st_is_within_distance(pt, lines, .Machine$double.eps)
Sparse geometry binary predicate list of length 1, where the predicate
was `is_within_distance'
 1: 1, 2

So this is obviously the 'feature-not-a-bug' that prevents me using lwgeom::st_split to successfully break linestrings at all their points of intersection. I've written code that works around it by inserting all the intersection points into the linestrings but it's not practical because it's slow! Frustrating!

Sorry for the delay (away from Wifi this week!).

Definitely reproducible outside sf!

library(geos)

sd <- 0.01
coords <- rbind(
  matrix(c(-1, 1, -1, 1) + rnorm(4, 0, sd), 2, 2), 
  matrix(c(-1, 1, 1, -1) + rnorm(4, 0, sd), 2, 2)
)

ids <- c(1, 1, 2, 2)

linestrings <- geos_make_linestring(coords[, 1], coords[, 2], feature_id = ids)

pt <- geos_intersection(linestrings[1], linestrings[2])

geos_intersects(pt, linestrings[1])
#> [1] FALSE
geos_intersects(pt, linestrings[2])
#> [1] FALSE

A slightly smaller scale version of that is the orientation index (which should be zero of the point exactly intersects a segment):

library(geos)
set.seed(1234)

sd <- 0.01
coords1 <- matrix(c(-1, 1, -1, 1) + rnorm(4, 0, sd), 2, 2)
coords2 <- matrix(c(-1, 1, 1, -1) + rnorm(4, 0, sd), 2, 2)
coords <- rbind(coords1, coords2)

ids <- c(1, 1, 2, 2)

linestrings <- geos_make_linestring(
  coords[, 1],
  coords[, 2],
  feature_id = ids
)

pt <- geos_intersection(linestrings[1], linestrings[2])

geos_orientation_index(
  list(coords1[1, 1], coords1[1, 2], coords1[2, 1], coords1[2, 2]),
  list(geos_x(pt), geos_y(pt))
)
#> [1] 1

There is somewhere on the internet a very cool javascript animation of why the point-segment intersection problem is very difficult, although I can't find it at the moment. I think that GEOS uses exact or very high floating point math for this kind of operation; however, it is probably just that the exact point of intersection just can't be represented with a double.

As Roger noted, a "distance within" check is really what you need here.

Thanks for the information. The irony is that since I am making the points by intersecting the lines, I already know that they intersect them. As I noted in passing at the top, I ran into this while trying to lwgeom::st_split a collection of lines at their points of intersection. I assume that failed due to this problem. A home made split_line_by_points function is in my future, it would seem, with the unfortunate side effect that the sub-lines it produces won't be coincident with the original lines!

geos has geos_project and other linear reference functions from GEOS. Otherwise maybe look at how sfnetworks and others node linestring objects, perhaps spatstat.linnet.