st_simplify produces error when specifying tolerance values per simple feature on S2 code path
goergen95 opened this issue · 2 comments
Describe the bug
I am trying to simplify polygons with a tolerance value based on the number of vertices per simple feature.
On the s2::s2_simplify()
seems to expect a single value for the tolerance parameter. However, the docs for sf::st_simplify()
state:
dTolerance numeric; tolerance parameter, specified for all or for each feature geometry. [..]
To Reproduce
library(sf)
#> Linking to GEOS 3.12.2, GDAL 3.9.1, PROJ 9.4.1; sf_use_s2() is TRUE
sf_use_s2(TRUE)
nc <- read_sf(system.file("shape/nc.shp", package="sf"))
nc <- st_transform(nc, st_crs("EPSG:4326"))
vertices <- as.numeric(table(st_coordinates(nc)[ ,5]))
nc2 <- st_simplify(nc, dTolerance = round(vertices/10))
#> Error in if (snap_radius > 3) {: the condition has length > 1
Created on 2024-09-18 with reprex v2.1.0
Session info
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#> setting value
#> version R version 4.4.1 (2024-06-14)
#> os Ubuntu 22.04.4 LTS
#> system x86_64, linux-gnu
#> ui X11
#> language (EN)
#> collate en_US.UTF-8
#> ctype en_US.UTF-8
#> tz Etc/UTC
#> date 2024-09-18
#> pandoc 3.2 @ /usr/bin/ (via rmarkdown)
#>
#> ─ Packages ───────────────────────────────────────────────────────────────────
#> package * version date (UTC) lib source
#> class 7.3-22 2023-05-03 [2] CRAN (R 4.4.1)
#> classInt 0.4-10 2023-09-05 [1] CRAN (R 4.4.1)
#> cli 3.6.3 2024-06-21 [1] RSPM (R 4.4.0)
#> DBI 1.2.3 2024-06-02 [1] RSPM (R 4.4.0)
#> digest 0.6.37 2024-08-19 [1] RSPM (R 4.4.0)
#> e1071 1.7-14 2023-12-06 [1] CRAN (R 4.4.1)
#> evaluate 0.24.0 2024-06-10 [1] RSPM (R 4.4.0)
#> fansi 1.0.6 2023-12-08 [1] RSPM (R 4.4.0)
#> fastmap 1.2.0 2024-05-15 [1] RSPM (R 4.4.0)
#> fs 1.6.4 2024-04-25 [1] RSPM (R 4.4.0)
#> glue 1.7.0 2024-01-09 [1] RSPM (R 4.4.0)
#> htmltools 0.5.8.1 2024-04-04 [1] RSPM (R 4.4.0)
#> KernSmooth 2.23-24 2024-05-17 [2] CRAN (R 4.4.1)
#> knitr 1.48 2024-07-07 [1] RSPM (R 4.4.0)
#> lifecycle 1.0.4 2023-11-07 [1] RSPM (R 4.4.0)
#> magrittr 2.0.3 2022-03-30 [1] RSPM (R 4.4.0)
#> pillar 1.9.0 2023-03-22 [1] RSPM (R 4.4.0)
#> pkgconfig 2.0.3 2019-09-22 [1] RSPM (R 4.4.0)
#> proxy 0.4-27 2022-06-09 [1] CRAN (R 4.4.1)
#> Rcpp 1.0.13 2024-07-17 [1] RSPM (R 4.4.0)
#> reprex 2.1.0 2024-01-11 [1] RSPM (R 4.4.0)
#> rlang 1.1.4 2024-06-04 [1] RSPM (R 4.4.0)
#> rmarkdown 2.28 2024-08-17 [1] RSPM (R 4.4.0)
#> rstudioapi 0.16.0 2024-03-24 [1] RSPM (R 4.4.0)
#> s2 1.1.7 2024-07-17 [1] CRAN (R 4.4.1)
#> sessioninfo 1.2.2 2021-12-06 [1] RSPM (R 4.4.0)
#> sf * 1.0-17 2024-08-29 [1] Github (r-spatial/sf@e5bce2e)
#> tibble 3.2.1 2023-03-20 [1] RSPM (R 4.4.0)
#> units 0.8-5 2023-11-28 [1] CRAN (R 4.4.1)
#> utf8 1.2.4 2023-10-22 [1] RSPM (R 4.4.0)
#> vctrs 0.6.5 2023-12-01 [1] RSPM (R 4.4.0)
#> withr 3.0.1 2024-07-31 [1] RSPM (R 4.4.0)
#> wk 0.9.2 2024-07-09 [1] CRAN (R 4.4.1)
#> xfun 0.47 2024-08-17 [1] RSPM (R 4.4.0)
#> yaml 2.3.10 2024-07-26 [1] RSPM (R 4.4.0)
#>
#> [1] /usr/local/lib/R/site-library
#> [2] /usr/local/lib/R/library
#>
#> ──────────────────────────────────────────────────────────────────────────────
The PR looks sensible at first sight, but I wonder whether s2
does the simplification feature by feature (that is, does doing the loop in R instead of in s2
always result in the same output), and whether there is a performance loss. I would feel more comfortable to only use the mapply
construct when dTolerance
has a length > 1 (although the code is somewhat longer, then).
I adjusted the PR as you suggested. However, both of your hunches turned out to be true:
using multiple values for dTolerance
on the s2
code path using mapply
is very inefficient and does not return identical results. Not sure if there is much to do here on the R side and if it is worth to open an issue upstream for this feature?
If not, I could make this PR about changing sf
documentation to make users aware that supplying multiple values for dTolerance
when s2
is switched on is not allowed and maybe issue a warning?
Code Example
remotes::install_github("goergen95/sf", ref = "issue2442")
#> Skipping install of 'sf' from a github remote, the SHA1 (49b5fbd1) has not changed since last install.
#> Use `force = TRUE` to force installation
library(sf)
#> Linking to GEOS 3.13.0, GDAL 3.9.2, PROJ 9.5.0; sf_use_s2() is TRUE
library(microbenchmark)
nc <- read_sf(system.file("shape/nc.shp", package="sf"))
nc <- st_transform(nc, st_crs("EPSG:4326"))
sf_use_s2(TRUE)
microbenchmark(
single_tolerance = {nc1 <- st_simplify(nc, dTolerance = 2500)},
multi_tolerance = {nc2 <- st_simplify(nc, dTolerance = rep(2500, nrow(nc)))},
times = 100
)
#> Unit: milliseconds
#> expr min lq mean median uq max
#> single_tolerance 18.02828 20.66847 24.17936 23.15384 27.14953 53.99792
#> multi_tolerance 95.44030 104.83197 111.95288 111.30223 115.69527 162.41727
#> neval cld
#> 100 a
#> 100 b
identical(nc1, nc2)
#> [1] FALSE
sf_use_s2(FALSE)
#> Spherical geometry (s2) switched off
suppressWarnings({
microbenchmark(
single_tolerance = {nc3 <- st_simplify(nc, dTolerance = 0.025)},
multi_tolerance = {nc4 <- st_simplify(nc, dTolerance = rep(0.025, nrow(nc)))},
times = 100
)})
#> Unit: milliseconds
#> expr min lq mean median uq max neval
#> single_tolerance 2.358385 2.825957 3.870791 3.612138 4.845731 6.692318 100
#> multi_tolerance 2.316221 2.775396 3.899538 3.819797 4.909197 7.329169 100
#> cld
#> a
#> a
identical(nc3, nc4)
#> [1] TRUE
Created on 2024-10-01 with reprex v2.1.0
Session info
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#> setting value
#> version R version 4.4.1 (2024-06-14)
#> os Ubuntu 22.04.5 LTS
#> system x86_64, linux-gnu
#> ui X11
#> language (EN)
#> collate en_US.UTF-8
#> ctype en_US.UTF-8
#> tz Etc/UTC
#> date 2024-10-01
#> pandoc 3.2 @ /usr/bin/ (via rmarkdown)
#>
#> ─ Packages ───────────────────────────────────────────────────────────────────
#> package * version date (UTC) lib source
#> class 7.3-22 2023-05-03 [2] CRAN (R 4.4.1)
#> classInt 0.4-10 2023-09-05 [1] CRAN (R 4.4.1)
#> cli 3.6.3 2024-06-21 [1] RSPM (R 4.4.0)
#> codetools 0.2-20 2024-03-31 [2] CRAN (R 4.4.1)
#> curl 5.2.3 2024-09-20 [1] RSPM (R 4.4.0)
#> DBI 1.2.3 2024-06-02 [1] RSPM (R 4.4.0)
#> digest 0.6.37 2024-08-19 [1] RSPM (R 4.4.0)
#> e1071 1.7-16 2024-09-16 [1] CRAN (R 4.4.1)
#> evaluate 1.0.0 2024-09-17 [1] RSPM (R 4.4.0)
#> fansi 1.0.6 2023-12-08 [1] RSPM (R 4.4.0)
#> fastmap 1.2.0 2024-05-15 [1] RSPM (R 4.4.0)
#> fs 1.6.4 2024-04-25 [1] RSPM (R 4.4.0)
#> glue 1.7.0 2024-01-09 [1] RSPM (R 4.4.0)
#> htmltools 0.5.8.1 2024-04-04 [1] RSPM (R 4.4.0)
#> KernSmooth 2.23-24 2024-05-17 [2] CRAN (R 4.4.1)
#> knitr 1.48 2024-07-07 [1] RSPM (R 4.4.0)
#> lattice 0.22-6 2024-03-20 [2] CRAN (R 4.4.1)
#> lifecycle 1.0.4 2023-11-07 [1] RSPM (R 4.4.0)
#> magrittr 2.0.3 2022-03-30 [1] RSPM (R 4.4.0)
#> MASS 7.3-60.2 2024-04-26 [2] CRAN (R 4.4.1)
#> Matrix 1.7-0 2024-04-26 [2] CRAN (R 4.4.1)
#> microbenchmark * 1.5.0 2024-09-04 [1] CRAN (R 4.4.1)
#> multcomp 1.4-26 2024-07-18 [1] RSPM (R 4.4.0)
#> mvtnorm 1.3-1 2024-09-03 [1] RSPM (R 4.4.0)
#> pillar 1.9.0 2023-03-22 [1] RSPM (R 4.4.0)
#> pkgconfig 2.0.3 2019-09-22 [1] RSPM (R 4.4.0)
#> proxy 0.4-27 2022-06-09 [1] CRAN (R 4.4.1)
#> purrr 1.0.2 2023-08-10 [1] RSPM (R 4.4.0)
#> R.cache 0.16.0 2022-07-21 [1] RSPM (R 4.4.0)
#> R.methodsS3 1.8.2 2022-06-13 [1] RSPM (R 4.4.0)
#> R.oo 1.26.0 2024-01-24 [1] RSPM (R 4.4.0)
#> R.utils 2.12.3 2023-11-18 [1] RSPM (R 4.4.0)
#> Rcpp 1.0.13 2024-07-17 [1] RSPM (R 4.4.0)
#> remotes 2.5.0 2024-03-17 [1] RSPM (R 4.4.0)
#> reprex 2.1.0 2024-01-11 [1] RSPM (R 4.4.0)
#> rlang 1.1.4 2024-06-04 [1] RSPM (R 4.4.0)
#> rmarkdown 2.28 2024-08-17 [1] RSPM (R 4.4.0)
#> rstudioapi 0.16.0 2024-03-24 [1] RSPM (R 4.4.0)
#> s2 1.1.7 2024-07-17 [1] CRAN (R 4.4.1)
#> sandwich 3.1-1 2024-09-15 [1] CRAN (R 4.4.1)
#> sessioninfo 1.2.2 2021-12-06 [1] RSPM (R 4.4.0)
#> sf * 1.0-18 2024-10-01 [1] Github (goergen95/sf@49b5fbd)
#> styler 1.10.3 2024-04-07 [1] RSPM (R 4.4.0)
#> survival 3.6-4 2024-04-24 [2] CRAN (R 4.4.1)
#> TH.data 1.1-2 2023-04-17 [1] RSPM (R 4.4.0)
#> tibble 3.2.1 2023-03-20 [1] RSPM (R 4.4.0)
#> units 0.8-5 2023-11-28 [1] CRAN (R 4.4.1)
#> utf8 1.2.4 2023-10-22 [1] RSPM (R 4.4.0)
#> vctrs 0.6.5 2023-12-01 [1] RSPM (R 4.4.0)
#> withr 3.0.1 2024-07-31 [1] RSPM (R 4.4.0)
#> wk 0.9.3 2024-09-06 [1] CRAN (R 4.4.1)
#> xfun 0.47 2024-08-17 [1] RSPM (R 4.4.0)
#> yaml 2.3.10 2024-07-26 [1] RSPM (R 4.4.0)
#> zoo 1.8-12 2023-04-13 [1] CRAN (R 4.4.1)
#>
#> [1] /usr/local/lib/R/site-library
#> [2] /usr/local/lib/R/library
#>
#> ──────────────────────────────────────────────────────────────────────────────