hypertidy/silicate

robustness to relational assumptions

Opened this issue · 4 comments

silicate is currently very naive in what it tries to do, and if there's a disconnected between the set of object$object_ and object$object_link_edge then conversions to SC0 and so on will fail (try SC0(x)).

Assuming:

  • parent table ids should be unique, effectively primary keys
  • all parent ids exist in link tables

we can get around both of these, by filtering out parents that don't get reference, and ignoring links that go nowhere - but there'll need to be some kind of formalism around this.

Reprex for a given example from osmdata_sc:

l
#> Error in eval(expr, envir, enclos): object 'l' not found
library(osmdata)
#> Data (c) OpenStreetMap contributors, ODbL 1.0. http://www.openstreetmap.org/copyright
x <- opq ("hampi india") %>%
  add_osm_feature (key="historic", value="ruins") %>%
  osmdata_sc ()

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
nrow(distinct(x$object, object_)) == nrow(x$object)
#> [1] FALSE


setdiff(x$object$object_, x$object_link_edge$object_)
#>  [1] "339933896"  "339934121"  "340357917"  "340371192"  "340371784" 
#>  [6] "340371819"  "340372074"  "340372141"  "977472885"  "980400748" 
#> [11] "980400822"  "1148814767" "1427080593" "1427116083" "1676740758"
#> [16] "1676740761" "3456932077" "3907790727" "4066327806" "4550329208"


silicate::SC0(x)
#> Error in `[[<-.data.frame`(`*tmp*`, "topology_", value = list(`1` = structure(list(: replacement has 21 rows, data has 97
devtools::session_info()
#> ─ Session info ──────────────────────────────────────────────────────────
#>  setting  value                       
#>  version  R version 3.5.1 (2018-07-02)
#>  os       Ubuntu 18.04.1 LTS          
#>  system   x86_64, linux-gnu           
#>  ui       X11                         
#>  language (EN)                        
#>  collate  en_AU.UTF-8                 
#>  ctype    en_AU.UTF-8                 
#>  tz       Etc/UTC                     
#>  date     2018-11-27                  
#> 
#> ─ Packages ──────────────────────────────────────────────────────────────
#>  package     * version    date       lib source                           
#>  assertthat    0.2.0      2017-04-11 [2] CRAN (R 3.5.1)                   
#>  backports     1.1.2      2017-12-13 [2] CRAN (R 3.5.1)                   
#>  base64enc     0.1-3      2015-07-28 [2] CRAN (R 3.5.1)                   
#>  bindr         0.1.1      2018-03-13 [2] CRAN (R 3.5.1)                   
#>  bindrcpp      0.2.2      2018-03-29 [2] CRAN (R 3.5.1)                   
#>  callr         3.0.0.9001 2018-11-13 [2] Github (r-lib/callr@be7707d)     
#>  cli           1.0.1.9000 2018-11-13 [2] Github (r-lib/cli@56538e3)       
#>  crayon        1.3.4      2017-09-16 [2] CRAN (R 3.5.1)                   
#>  curl          3.2        2018-03-28 [2] CRAN (R 3.5.1)                   
#>  debugme       1.1.0      2017-10-22 [2] CRAN (R 3.5.1)                   
#>  desc          1.2.0      2018-11-13 [2] Github (r-lib/desc@7c12d36)      
#>  devtools      2.0.1      2018-10-26 [2] CRAN (R 3.5.1)                   
#>  digest        0.6.18     2018-10-10 [2] CRAN (R 3.5.1)                   
#>  dplyr       * 0.7.8      2018-11-10 [2] CRAN (R 3.5.1)                   
#>  evaluate      0.12       2018-10-09 [2] CRAN (R 3.5.1)                   
#>  fs            1.2.6      2018-08-23 [2] CRAN (R 3.5.1)                   
#>  gibble        0.1.2.9001 2018-11-27 [1] local                            
#>  glue          1.3.0      2018-07-17 [2] CRAN (R 3.5.1)                   
#>  htmltools     0.3.6      2017-04-28 [2] CRAN (R 3.5.1)                   
#>  httr          1.3.1      2017-08-20 [2] CRAN (R 3.5.1)                   
#>  jsonlite      1.5        2017-06-01 [2] CRAN (R 3.5.1)                   
#>  knitr         1.20       2018-02-20 [2] CRAN (R 3.5.1)                   
#>  lattice       0.20-38    2018-11-04 [4] CRAN (R 3.5.1)                   
#>  lubridate     1.7.4      2018-04-11 [2] CRAN (R 3.5.1)                   
#>  magrittr      1.5        2014-11-22 [2] CRAN (R 3.5.1)                   
#>  memoise       1.1.0      2017-04-21 [2] CRAN (R 3.5.1)                   
#>  osmdata     * 0.0.8.099  2018-11-27 [1] Github (ropensci/osmdata@94e377e)
#>  pillar        1.3.0      2018-07-14 [2] CRAN (R 3.5.1)                   
#>  pkgbuild      1.0.2      2018-10-16 [2] CRAN (R 3.5.1)                   
#>  pkgconfig     2.0.2      2018-08-16 [2] CRAN (R 3.5.1)                   
#>  pkgload       1.0.2      2018-10-29 [2] CRAN (R 3.5.1)                   
#>  prettyunits   1.0.2      2015-07-13 [2] CRAN (R 3.5.1)                   
#>  processx      3.2.0.9000 2018-11-13 [2] Github (r-lib/processx@8374340)  
#>  ps            1.2.1      2018-11-06 [2] CRAN (R 3.5.1)                   
#>  purrr         0.2.5      2018-05-29 [2] CRAN (R 3.5.1)                   
#>  R6            2.3.0      2018-10-04 [2] CRAN (R 3.5.1)                   
#>  Rcpp          1.0.0      2018-11-07 [2] CRAN (R 3.5.1)                   
#>  remotes       2.0.2      2018-10-30 [2] CRAN (R 3.5.1)                   
#>  rlang         0.3.0.1    2018-10-25 [2] CRAN (R 3.5.1)                   
#>  rmarkdown     1.10       2018-06-11 [2] CRAN (R 3.5.1)                   
#>  rprojroot     1.3-2      2018-01-03 [2] CRAN (R 3.5.1)                   
#>  rvest         0.3.2      2016-06-17 [2] CRAN (R 3.5.1)                   
#>  sessioninfo   1.1.1      2018-11-05 [2] CRAN (R 3.5.1)                   
#>  silicate      0.1.5.9001 2018-11-27 [1] local                            
#>  sp            1.3-1      2018-06-05 [2] CRAN (R 3.5.1)                   
#>  stringi       1.2.4      2018-07-20 [2] CRAN (R 3.5.1)                   
#>  stringr       1.3.1      2018-05-10 [2] CRAN (R 3.5.1)                   
#>  testthat      2.0.1      2018-10-13 [2] CRAN (R 3.5.1)                   
#>  tibble        1.4.2      2018-01-22 [2] CRAN (R 3.5.1)                   
#>  tidyr         0.8.2      2018-10-28 [2] CRAN (R 3.5.1)                   
#>  tidyselect    0.2.5      2018-10-11 [2] CRAN (R 3.5.1)                   
#>  usethis       1.4.0      2018-08-14 [2] CRAN (R 3.5.1)                   
#>  withr         2.1.2      2018-03-15 [2] CRAN (R 3.5.1)                   
#>  xml2          1.2.0      2018-01-24 [2] CRAN (R 3.5.1)                   
#>  yaml          2.2.0      2018-07-25 [2] CRAN (R 3.5.1)                   
#> 
#> [1] /perm_storage/home/mdsumner/R/x86_64-pc-linux-gnu-library/3.5
#> [2] /usr/local/lib/R/site-library
#> [3] /usr/lib/R/site-library
#> [4] /usr/lib/R/library

Created on 2018-11-27 by the reprex package (v0.2.1)

A superficial fix for the start of some of the issues here is in this remote commit. That at least introduces an initial degree of robustness that I think should be present, although of course this has effects that will still need to be processed downstream because the resultant table will have empty topology_ entries.

There are also of course larger issues at play. I do concur with your general assumptions / visions up until now that the object and object_link_edge tables should generally accord. I do think this also makes sense in the osmdata_sc context, and can mostly be reconciled by ensuring that every object_link_edge$object_ is also present in the object table itself. This will not necessarily be so, because my osmdata_sc vision to now has been of the object table more as equivalent to a "feature" matrix so that, for example, SC(an_sfc_list) would yield an empty object table. I can easily reconcile that with your vision by ensuring that nrow(SC(an_sfc_list)$object) == nrow(SC(an_sfc_list)$object_link_edge). This requires simply inserting some effectively empty entries in the object table, much as the commit linked above does for SC0.

It does not, however, solve a coupla larger issues.

  1. osmdata_sc will still generate a lot of duplicated object$object_ entries for reasons described in the osmdata_sc issue, so in general nrow(x$object) will be quite a bit more than nrow(x$object_link_edge), and I don't see any way around that (all sphier-like considerations and whatnot). That of course may make it tricky when joining object and object_link_edge tables, but the object table could easily be spread prior to that act, I'd suggest.
  2. (This one is entirely my fault, no criticism at all, but ...) I still don't see the real logic behind
identical(SC(minimal_mesh), SC(minimal_mesh) %>% SC()) # FALSE

? I need that vision to really be able to work towards a solution here.

They are randomly generate UIDs, no two will ever be the same (though SC0 ought to be, with some caveats around coordinate precision/identity).

identical(SC(minimal_mesh), SC(minimal_mesh))
# [1] FALSE

set.seed(10)
x <- SC(minimal_mesh)
set.seed(10)
identical(x, SC(minimal_mesh))
[1] TRUE

Oh yeah, stupid me, of course. Ta! Oh no, but hey, this is going to affect the osmdata_sc kind of stuff, because those IDs should be enduring and propagated throughout. I'll delve in myself there ...

There is a missing SC.SC <- function(x, ...) x - the other models have that shortcut - but I guess you meant more generally as well.