r-spatial/sf

st_simplify produces error when specifying tolerance values per simple feature on S2 code path

goergen95 opened this issue · 2 comments

Describe the bug
I am trying to simplify polygons with a tolerance value based on the number of vertices per simple feature.
On the $S^2$ code path, this results in an error because s2::s2_simplify() seems to expect a single value for the tolerance parameter. However, the docs for sf::st_simplify() state:

dTolerance numeric; tolerance parameter, specified for all or for each feature geometry. [..]

To Reproduce

library(sf)
#> Linking to GEOS 3.12.2, GDAL 3.9.1, PROJ 9.4.1; sf_use_s2() is TRUE

sf_use_s2(TRUE)  

nc <- read_sf(system.file("shape/nc.shp", package="sf"))
nc <- st_transform(nc, st_crs("EPSG:4326"))
vertices <- as.numeric(table(st_coordinates(nc)[ ,5]))
nc2 <- st_simplify(nc, dTolerance = round(vertices/10))
#> Error in if (snap_radius > 3) {: the condition has length > 1

Created on 2024-09-18 with reprex v2.1.0

Session info
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.4.1 (2024-06-14)
#>  os       Ubuntu 22.04.4 LTS
#>  system   x86_64, linux-gnu
#>  ui       X11
#>  language (EN)
#>  collate  en_US.UTF-8
#>  ctype    en_US.UTF-8
#>  tz       Etc/UTC
#>  date     2024-09-18
#>  pandoc   3.2 @ /usr/bin/ (via rmarkdown)
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package     * version date (UTC) lib source
#>  class         7.3-22  2023-05-03 [2] CRAN (R 4.4.1)
#>  classInt      0.4-10  2023-09-05 [1] CRAN (R 4.4.1)
#>  cli           3.6.3   2024-06-21 [1] RSPM (R 4.4.0)
#>  DBI           1.2.3   2024-06-02 [1] RSPM (R 4.4.0)
#>  digest        0.6.37  2024-08-19 [1] RSPM (R 4.4.0)
#>  e1071         1.7-14  2023-12-06 [1] CRAN (R 4.4.1)
#>  evaluate      0.24.0  2024-06-10 [1] RSPM (R 4.4.0)
#>  fansi         1.0.6   2023-12-08 [1] RSPM (R 4.4.0)
#>  fastmap       1.2.0   2024-05-15 [1] RSPM (R 4.4.0)
#>  fs            1.6.4   2024-04-25 [1] RSPM (R 4.4.0)
#>  glue          1.7.0   2024-01-09 [1] RSPM (R 4.4.0)
#>  htmltools     0.5.8.1 2024-04-04 [1] RSPM (R 4.4.0)
#>  KernSmooth    2.23-24 2024-05-17 [2] CRAN (R 4.4.1)
#>  knitr         1.48    2024-07-07 [1] RSPM (R 4.4.0)
#>  lifecycle     1.0.4   2023-11-07 [1] RSPM (R 4.4.0)
#>  magrittr      2.0.3   2022-03-30 [1] RSPM (R 4.4.0)
#>  pillar        1.9.0   2023-03-22 [1] RSPM (R 4.4.0)
#>  pkgconfig     2.0.3   2019-09-22 [1] RSPM (R 4.4.0)
#>  proxy         0.4-27  2022-06-09 [1] CRAN (R 4.4.1)
#>  Rcpp          1.0.13  2024-07-17 [1] RSPM (R 4.4.0)
#>  reprex        2.1.0   2024-01-11 [1] RSPM (R 4.4.0)
#>  rlang         1.1.4   2024-06-04 [1] RSPM (R 4.4.0)
#>  rmarkdown     2.28    2024-08-17 [1] RSPM (R 4.4.0)
#>  rstudioapi    0.16.0  2024-03-24 [1] RSPM (R 4.4.0)
#>  s2            1.1.7   2024-07-17 [1] CRAN (R 4.4.1)
#>  sessioninfo   1.2.2   2021-12-06 [1] RSPM (R 4.4.0)
#>  sf          * 1.0-17  2024-08-29 [1] Github (r-spatial/sf@e5bce2e)
#>  tibble        3.2.1   2023-03-20 [1] RSPM (R 4.4.0)
#>  units         0.8-5   2023-11-28 [1] CRAN (R 4.4.1)
#>  utf8          1.2.4   2023-10-22 [1] RSPM (R 4.4.0)
#>  vctrs         0.6.5   2023-12-01 [1] RSPM (R 4.4.0)
#>  withr         3.0.1   2024-07-31 [1] RSPM (R 4.4.0)
#>  wk            0.9.2   2024-07-09 [1] CRAN (R 4.4.1)
#>  xfun          0.47    2024-08-17 [1] RSPM (R 4.4.0)
#>  yaml          2.3.10  2024-07-26 [1] RSPM (R 4.4.0)
#> 
#>  [1] /usr/local/lib/R/site-library
#>  [2] /usr/local/lib/R/library
#> 
#> ──────────────────────────────────────────────────────────────────────────────

The PR looks sensible at first sight, but I wonder whether s2 does the simplification feature by feature (that is, does doing the loop in R instead of in s2 always result in the same output), and whether there is a performance loss. I would feel more comfortable to only use the mapply construct when dTolerance has a length > 1 (although the code is somewhat longer, then).

I adjusted the PR as you suggested. However, both of your hunches turned out to be true:
using multiple values for dTolerance on the s2 code path using mapply is very inefficient and does not return identical results. Not sure if there is much to do here on the R side and if it is worth to open an issue upstream for this feature?
If not, I could make this PR about changing sf documentation to make users aware that supplying multiple values for dTolerance when s2 is switched on is not allowed and maybe issue a warning?

Code Example
remotes::install_github("goergen95/sf", ref = "issue2442")
#> Skipping install of 'sf' from a github remote, the SHA1 (49b5fbd1) has not changed since last install.
#>   Use `force = TRUE` to force installation
library(sf)
#> Linking to GEOS 3.13.0, GDAL 3.9.2, PROJ 9.5.0; sf_use_s2() is TRUE
library(microbenchmark)

nc <- read_sf(system.file("shape/nc.shp", package="sf"))
nc <- st_transform(nc, st_crs("EPSG:4326"))

sf_use_s2(TRUE)  
microbenchmark(
    single_tolerance = {nc1 <- st_simplify(nc, dTolerance = 2500)},
    multi_tolerance = {nc2 <- st_simplify(nc, dTolerance = rep(2500, nrow(nc)))},
    times = 100
)
#> Unit: milliseconds
#>              expr      min        lq      mean    median        uq       max
#>  single_tolerance 18.02828  20.66847  24.17936  23.15384  27.14953  53.99792
#>   multi_tolerance 95.44030 104.83197 111.95288 111.30223 115.69527 162.41727
#>  neval cld
#>    100  a 
#>    100   b
identical(nc1, nc2)
#> [1] FALSE

sf_use_s2(FALSE)
#> Spherical geometry (s2) switched off
suppressWarnings({
    microbenchmark(
        single_tolerance = {nc3 <- st_simplify(nc, dTolerance = 0.025)},
        multi_tolerance = {nc4 <- st_simplify(nc, dTolerance = rep(0.025, nrow(nc)))},
        times = 100
    )})
#> Unit: milliseconds
#>              expr      min       lq     mean   median       uq      max neval
#>  single_tolerance 2.358385 2.825957 3.870791 3.612138 4.845731 6.692318   100
#>   multi_tolerance 2.316221 2.775396 3.899538 3.819797 4.909197 7.329169   100
#>  cld
#>    a
#>    a
identical(nc3, nc4)
#> [1] TRUE

Created on 2024-10-01 with reprex v2.1.0

Session info
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.4.1 (2024-06-14)
#>  os       Ubuntu 22.04.5 LTS
#>  system   x86_64, linux-gnu
#>  ui       X11
#>  language (EN)
#>  collate  en_US.UTF-8
#>  ctype    en_US.UTF-8
#>  tz       Etc/UTC
#>  date     2024-10-01
#>  pandoc   3.2 @ /usr/bin/ (via rmarkdown)
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package        * version  date (UTC) lib source
#>  class            7.3-22   2023-05-03 [2] CRAN (R 4.4.1)
#>  classInt         0.4-10   2023-09-05 [1] CRAN (R 4.4.1)
#>  cli              3.6.3    2024-06-21 [1] RSPM (R 4.4.0)
#>  codetools        0.2-20   2024-03-31 [2] CRAN (R 4.4.1)
#>  curl             5.2.3    2024-09-20 [1] RSPM (R 4.4.0)
#>  DBI              1.2.3    2024-06-02 [1] RSPM (R 4.4.0)
#>  digest           0.6.37   2024-08-19 [1] RSPM (R 4.4.0)
#>  e1071            1.7-16   2024-09-16 [1] CRAN (R 4.4.1)
#>  evaluate         1.0.0    2024-09-17 [1] RSPM (R 4.4.0)
#>  fansi            1.0.6    2023-12-08 [1] RSPM (R 4.4.0)
#>  fastmap          1.2.0    2024-05-15 [1] RSPM (R 4.4.0)
#>  fs               1.6.4    2024-04-25 [1] RSPM (R 4.4.0)
#>  glue             1.7.0    2024-01-09 [1] RSPM (R 4.4.0)
#>  htmltools        0.5.8.1  2024-04-04 [1] RSPM (R 4.4.0)
#>  KernSmooth       2.23-24  2024-05-17 [2] CRAN (R 4.4.1)
#>  knitr            1.48     2024-07-07 [1] RSPM (R 4.4.0)
#>  lattice          0.22-6   2024-03-20 [2] CRAN (R 4.4.1)
#>  lifecycle        1.0.4    2023-11-07 [1] RSPM (R 4.4.0)
#>  magrittr         2.0.3    2022-03-30 [1] RSPM (R 4.4.0)
#>  MASS             7.3-60.2 2024-04-26 [2] CRAN (R 4.4.1)
#>  Matrix           1.7-0    2024-04-26 [2] CRAN (R 4.4.1)
#>  microbenchmark * 1.5.0    2024-09-04 [1] CRAN (R 4.4.1)
#>  multcomp         1.4-26   2024-07-18 [1] RSPM (R 4.4.0)
#>  mvtnorm          1.3-1    2024-09-03 [1] RSPM (R 4.4.0)
#>  pillar           1.9.0    2023-03-22 [1] RSPM (R 4.4.0)
#>  pkgconfig        2.0.3    2019-09-22 [1] RSPM (R 4.4.0)
#>  proxy            0.4-27   2022-06-09 [1] CRAN (R 4.4.1)
#>  purrr            1.0.2    2023-08-10 [1] RSPM (R 4.4.0)
#>  R.cache          0.16.0   2022-07-21 [1] RSPM (R 4.4.0)
#>  R.methodsS3      1.8.2    2022-06-13 [1] RSPM (R 4.4.0)
#>  R.oo             1.26.0   2024-01-24 [1] RSPM (R 4.4.0)
#>  R.utils          2.12.3   2023-11-18 [1] RSPM (R 4.4.0)
#>  Rcpp             1.0.13   2024-07-17 [1] RSPM (R 4.4.0)
#>  remotes          2.5.0    2024-03-17 [1] RSPM (R 4.4.0)
#>  reprex           2.1.0    2024-01-11 [1] RSPM (R 4.4.0)
#>  rlang            1.1.4    2024-06-04 [1] RSPM (R 4.4.0)
#>  rmarkdown        2.28     2024-08-17 [1] RSPM (R 4.4.0)
#>  rstudioapi       0.16.0   2024-03-24 [1] RSPM (R 4.4.0)
#>  s2               1.1.7    2024-07-17 [1] CRAN (R 4.4.1)
#>  sandwich         3.1-1    2024-09-15 [1] CRAN (R 4.4.1)
#>  sessioninfo      1.2.2    2021-12-06 [1] RSPM (R 4.4.0)
#>  sf             * 1.0-18   2024-10-01 [1] Github (goergen95/sf@49b5fbd)
#>  styler           1.10.3   2024-04-07 [1] RSPM (R 4.4.0)
#>  survival         3.6-4    2024-04-24 [2] CRAN (R 4.4.1)
#>  TH.data          1.1-2    2023-04-17 [1] RSPM (R 4.4.0)
#>  tibble           3.2.1    2023-03-20 [1] RSPM (R 4.4.0)
#>  units            0.8-5    2023-11-28 [1] CRAN (R 4.4.1)
#>  utf8             1.2.4    2023-10-22 [1] RSPM (R 4.4.0)
#>  vctrs            0.6.5    2023-12-01 [1] RSPM (R 4.4.0)
#>  withr            3.0.1    2024-07-31 [1] RSPM (R 4.4.0)
#>  wk               0.9.3    2024-09-06 [1] CRAN (R 4.4.1)
#>  xfun             0.47     2024-08-17 [1] RSPM (R 4.4.0)
#>  yaml             2.3.10   2024-07-26 [1] RSPM (R 4.4.0)
#>  zoo              1.8-12   2023-04-13 [1] CRAN (R 4.4.1)
#> 
#>  [1] /usr/local/lib/R/site-library
#>  [2] /usr/local/lib/R/library
#> 
#> ──────────────────────────────────────────────────────────────────────────────