dcooley/sfheaders

problem with sf_to_df

mdsumner opened this issue · 9 comments

There an issue with polygon/line encoding as data frame, here the polygon_id is incremented incorrectly (it's not clear to me if IDs generally are "global" or "within group hierarchy", but it looks like the latter (i.e. IDs reset within top-level, but not every level).

x <- silicate::minimal_mesh
library(sfheaders)
library(dplyr)
sf_to_df(x) %>% tibble::as_tibble()
# A tibble: 19 x 6
   sfg_id multipolygon_id polygon_id linestring_id     x     y
    <dbl>           <dbl>      <dbl>         <dbl> <dbl> <dbl>
 1      1               1          1             1  0     0   
 2      1               1          1             1  0     1   
 3      1               1          1             1  0.75  1   
 4      1               1          1             1  1     0.8 
 5      1               1          1             1  0.5   0.7 
 6      1               1          1             1  0.8   0.6 
 7      1               1          1             1  0.69  0   
 8      1               1          1             1  0     0   
 9      1               1          2             1  0.2   0.2 
10      1               1          2             1  0.5   0.2 
11      1               1          2             1  0.5   0.4 
12      1               1          2             1  0.3   0.6 
13      1               1          2             1  0.2   0.4 
14      1               1          2             1  0.2   0.2 
15      2               2          1             1  0.69  0   
16      2               2          1             1  0.8   0.6 
17      2               2          1             1  1.1   0.63
18      2               2          1             1  1.23  0.3 
19      2               2          1             1  0.69  0 

Above, polygon_id should be constant at 1. But linestring_id should change (at row 9) to differentiate the hole in the first feature.

x1 <- sf_to_df(x) %>% tibble::as_tibble() %>% 
  sf_multipolygon(x  = "x", y = "y", linestring_id = "linestring_id",  
                  polygon_id = "polygon_id", multipolygon_id = "multipolygon_id")
library(sf)
plot(x1)

image

This is a fix (for this example, not a general one).

tab <- sf_to_df(x) %>% tibble::as_tibble() 
tab$linestring_id <- tab$polygon_id
tab$polygon_id <- 1
x2 <- tab %>% 
  sf_multipolygon(x  = "x", y = "y", linestring_id = "linestring_id",  
                  polygon_id = "polygon_id", multipolygon_id = "multipolygon_id")
plot(x2)

image

In related matters, I think you should - I hope you also consider (I'm thinking about this atm) - whether group IDs always increment or reset within groups (not sure it matters, but I settled on a run-length encoding in gibble, with subobjects identified for multipolygons. It means you need some grouping handling to restore the sfheader-alike df inputs, and I'm not on top of it atm, been meaning to see the relationship properly here - it would be neat if we shared the gibble (or similar) as a compact representation of what sfheaders treats as encoding).

As a user we can't tell immediately that the above is broken, but with this example we can:

x <- silicate::inlandwaters
library(sfheaders)
library(dplyr)
sf_to_df(x) %>% tibble::as_tibble() %>% 
  sf_multipolygon(x  = "x", y = "y", linestring_id = "linestring_id",  
                  polygon_id = "polygon_id", multipolygon_id = "multipolygon_id")
x <- silicate::inlandwaters
library(sfheaders)
library(dplyr)
sf_to_df(x) %>% tibble::as_tibble() %>% 
  sf_multipolygon(x  = "x", y = "y", linestring_id = "linestring_id",  
                  polygon_id = "polygon_id", multipolygon_id = "multipolygon_id")

What it should be is below, done with gibble and sfheaders (and the possible shared stuff I'd like to discuss more):

  x <- silicate::inlandwaters
gib <- gibble::gibble(x)

## easy expansion, but not sfheaders encoding because IDs are "global"
df <- tibble::tibble(linestring_id = rep(seq_len(nrow(gib)), gib$nrow),
                     polygon_id = rep(gib$subobject, gib$nrow),
                     multipolygon_id = rep(gib$object, gib$nrow))

df$x <- sf::st_coordinates(x)[,1]
df$y <- sf::st_coordinates(x)[,2]
xx <- sfheaders::sf_multipolygon(df, x = "x", y = "y",
                           multipolygon_id = "multipolygon_id",
                           polygon_id = "polygon_id",
                           linestring_id = "linestring_id")
plot(xx)

for (i in seq_len(nrow(x))) {
print(sf::st_difference(sf::st_set_crs(x$geom, NA)[i], xx$geometry[i]))
}
#> Geometry set for 0 features 
#> bbox:           xmin: NA ymin: NA xmax: NA ymax: NA
#> epsg (SRID):    NA
#> proj4string:    NA
#> Geometry set for 0 features 
#> bbox:           xmin: NA ymin: NA xmax: NA ymax: NA
#> epsg (SRID):    NA
#> proj4string:    NA
#> Geometry set for 0 features 
#> bbox:           xmin: NA ymin: NA xmax: NA ymax: NA
#> epsg (SRID):    NA
#> proj4string:    NA
#> Geometry set for 0 features 
#> bbox:           xmin: NA ymin: NA xmax: NA ymax: NA
#> epsg (SRID):    NA
#> proj4string:    NA
#> Geometry set for 0 features 
#> bbox:           xmin: NA ymin: NA xmax: NA ymax: NA
#> epsg (SRID):    NA
#> proj4string:    NA
#> Geometry set for 0 features 
#> bbox:           xmin: NA ymin: NA xmax: NA ymax: NA
#> epsg (SRID):    NA
#> proj4string:    NA

Created on 2020-03-05 by the reprex package (v0.3.0)

Thanks for reporting this - I can confirm it's an issue with the sfc_ conversion, not the sfg

devtools::install_github("hypertidy/silicate")

x <- silicate::minimal_mesh

str( x )

library(sf)

x[1, ]$geom[[1]]

sfheaders::sf_to_df( x[1, ] )
sfheaders::sfc_to_df( x[1, ]$geom )
sfheaders::sfg_to_df( x[1, ]$geom[[1]] )

fix and test in this patch

Sweet, thanks!

FWIW, this is what I was going for after the sf_remove_holes tweet, when I hit the bug:

x <- silicate::inlandwaters
#x <- silicate::minimal_mesh
library(dplyr)
library(sf)
library(sfheaders)
xx <- sf_to_df(x) %>% tibble::as_tibble() %>% 
  group_by(multipolygon_id, polygon_id)  %>% 
  filter(linestring_id == 1)  %>%   ## holes are linestring_id 1 within each polygon
  sf_multipolygon(x  = "x", y = "y",  polygon_id = "polygon_id", multipolygon_id = "multipolygon_id")
plot(xx)

I originally pursued this stuff in spbabel, but that predated sf so never had the multipolygon thing right. I'm hoping to resurrect that with sfheaders, so you could do stuff like this (just FYI, very happy you've done the hard yards here!):

x <- silicate::inlandwaters
#x <- silicate::minimal_mesh
library(dplyr)
library(sf)
library(sfheaders)


sftable <- function(x, ...) {
  sfheaders::sf_to_df(x)
}

'sftable<-' <- function(x, ..., value) {
  sfheaders::sf_multipolygon(value, 
                             x = "x", y = "y",
                             linestring_id = "linestring_id", 
                             polygon_id = "polygon_id", 
                             multipolygon_id = "multipolygon_id")
}

## then hole-removing tidy stuff like above has a general basis to modify with
## replacement (at least for multipolygon)
sftable(x) <- sftable(x)  %>% 
  group_by(multipolygon_id, polygon_id)  %>% 
  filter(linestring_id == 1)  
plot(x)

Would it be helpful to have an ignore_holes = TRUE argument inside sf_polygon() & sf_multipolygon() ?

Also, I'm going to close this issue because It's now fixed, but happy to continue the discussion on another issue (if we implement ignore_holes), or over on your repo if you have suggestions / questions on how to integrate sfheaders

cool thanks (I don't see the need for ignore_holes that's not what I was going for)

I thought it might be quicker to include it as a step inside the sf_polygon() construction, rather than having your filter( linestring_id == 1 ) step.

Oh, to me the df modification is a general capacity, a pathway we can customise for all kinds of things - removing holes is just an example