robustness to relational assumptions
Opened this issue · 4 comments
silicate is currently very naive in what it tries to do, and if there's a disconnected between the set of object$object_
and object$object_link_edge
then conversions to SC0
and so on will fail (try SC0(x)
).
Assuming:
- parent table ids should be unique, effectively primary keys
- all parent ids exist in link tables
we can get around both of these, by filtering out parents that don't get reference, and ignoring links that go nowhere - but there'll need to be some kind of formalism around this.
Reprex for a given example from osmdata_sc:
l
#> Error in eval(expr, envir, enclos): object 'l' not found
library(osmdata)
#> Data (c) OpenStreetMap contributors, ODbL 1.0. http://www.openstreetmap.org/copyright
x <- opq ("hampi india") %>%
add_osm_feature (key="historic", value="ruins") %>%
osmdata_sc ()
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
nrow(distinct(x$object, object_)) == nrow(x$object)
#> [1] FALSE
setdiff(x$object$object_, x$object_link_edge$object_)
#> [1] "339933896" "339934121" "340357917" "340371192" "340371784"
#> [6] "340371819" "340372074" "340372141" "977472885" "980400748"
#> [11] "980400822" "1148814767" "1427080593" "1427116083" "1676740758"
#> [16] "1676740761" "3456932077" "3907790727" "4066327806" "4550329208"
silicate::SC0(x)
#> Error in `[[<-.data.frame`(`*tmp*`, "topology_", value = list(`1` = structure(list(: replacement has 21 rows, data has 97
devtools::session_info()
#> ─ Session info ──────────────────────────────────────────────────────────
#> setting value
#> version R version 3.5.1 (2018-07-02)
#> os Ubuntu 18.04.1 LTS
#> system x86_64, linux-gnu
#> ui X11
#> language (EN)
#> collate en_AU.UTF-8
#> ctype en_AU.UTF-8
#> tz Etc/UTC
#> date 2018-11-27
#>
#> ─ Packages ──────────────────────────────────────────────────────────────
#> package * version date lib source
#> assertthat 0.2.0 2017-04-11 [2] CRAN (R 3.5.1)
#> backports 1.1.2 2017-12-13 [2] CRAN (R 3.5.1)
#> base64enc 0.1-3 2015-07-28 [2] CRAN (R 3.5.1)
#> bindr 0.1.1 2018-03-13 [2] CRAN (R 3.5.1)
#> bindrcpp 0.2.2 2018-03-29 [2] CRAN (R 3.5.1)
#> callr 3.0.0.9001 2018-11-13 [2] Github (r-lib/callr@be7707d)
#> cli 1.0.1.9000 2018-11-13 [2] Github (r-lib/cli@56538e3)
#> crayon 1.3.4 2017-09-16 [2] CRAN (R 3.5.1)
#> curl 3.2 2018-03-28 [2] CRAN (R 3.5.1)
#> debugme 1.1.0 2017-10-22 [2] CRAN (R 3.5.1)
#> desc 1.2.0 2018-11-13 [2] Github (r-lib/desc@7c12d36)
#> devtools 2.0.1 2018-10-26 [2] CRAN (R 3.5.1)
#> digest 0.6.18 2018-10-10 [2] CRAN (R 3.5.1)
#> dplyr * 0.7.8 2018-11-10 [2] CRAN (R 3.5.1)
#> evaluate 0.12 2018-10-09 [2] CRAN (R 3.5.1)
#> fs 1.2.6 2018-08-23 [2] CRAN (R 3.5.1)
#> gibble 0.1.2.9001 2018-11-27 [1] local
#> glue 1.3.0 2018-07-17 [2] CRAN (R 3.5.1)
#> htmltools 0.3.6 2017-04-28 [2] CRAN (R 3.5.1)
#> httr 1.3.1 2017-08-20 [2] CRAN (R 3.5.1)
#> jsonlite 1.5 2017-06-01 [2] CRAN (R 3.5.1)
#> knitr 1.20 2018-02-20 [2] CRAN (R 3.5.1)
#> lattice 0.20-38 2018-11-04 [4] CRAN (R 3.5.1)
#> lubridate 1.7.4 2018-04-11 [2] CRAN (R 3.5.1)
#> magrittr 1.5 2014-11-22 [2] CRAN (R 3.5.1)
#> memoise 1.1.0 2017-04-21 [2] CRAN (R 3.5.1)
#> osmdata * 0.0.8.099 2018-11-27 [1] Github (ropensci/osmdata@94e377e)
#> pillar 1.3.0 2018-07-14 [2] CRAN (R 3.5.1)
#> pkgbuild 1.0.2 2018-10-16 [2] CRAN (R 3.5.1)
#> pkgconfig 2.0.2 2018-08-16 [2] CRAN (R 3.5.1)
#> pkgload 1.0.2 2018-10-29 [2] CRAN (R 3.5.1)
#> prettyunits 1.0.2 2015-07-13 [2] CRAN (R 3.5.1)
#> processx 3.2.0.9000 2018-11-13 [2] Github (r-lib/processx@8374340)
#> ps 1.2.1 2018-11-06 [2] CRAN (R 3.5.1)
#> purrr 0.2.5 2018-05-29 [2] CRAN (R 3.5.1)
#> R6 2.3.0 2018-10-04 [2] CRAN (R 3.5.1)
#> Rcpp 1.0.0 2018-11-07 [2] CRAN (R 3.5.1)
#> remotes 2.0.2 2018-10-30 [2] CRAN (R 3.5.1)
#> rlang 0.3.0.1 2018-10-25 [2] CRAN (R 3.5.1)
#> rmarkdown 1.10 2018-06-11 [2] CRAN (R 3.5.1)
#> rprojroot 1.3-2 2018-01-03 [2] CRAN (R 3.5.1)
#> rvest 0.3.2 2016-06-17 [2] CRAN (R 3.5.1)
#> sessioninfo 1.1.1 2018-11-05 [2] CRAN (R 3.5.1)
#> silicate 0.1.5.9001 2018-11-27 [1] local
#> sp 1.3-1 2018-06-05 [2] CRAN (R 3.5.1)
#> stringi 1.2.4 2018-07-20 [2] CRAN (R 3.5.1)
#> stringr 1.3.1 2018-05-10 [2] CRAN (R 3.5.1)
#> testthat 2.0.1 2018-10-13 [2] CRAN (R 3.5.1)
#> tibble 1.4.2 2018-01-22 [2] CRAN (R 3.5.1)
#> tidyr 0.8.2 2018-10-28 [2] CRAN (R 3.5.1)
#> tidyselect 0.2.5 2018-10-11 [2] CRAN (R 3.5.1)
#> usethis 1.4.0 2018-08-14 [2] CRAN (R 3.5.1)
#> withr 2.1.2 2018-03-15 [2] CRAN (R 3.5.1)
#> xml2 1.2.0 2018-01-24 [2] CRAN (R 3.5.1)
#> yaml 2.2.0 2018-07-25 [2] CRAN (R 3.5.1)
#>
#> [1] /perm_storage/home/mdsumner/R/x86_64-pc-linux-gnu-library/3.5
#> [2] /usr/local/lib/R/site-library
#> [3] /usr/lib/R/site-library
#> [4] /usr/lib/R/library
Created on 2018-11-27 by the reprex package (v0.2.1)
A superficial fix for the start of some of the issues here is in this remote commit. That at least introduces an initial degree of robustness that I think should be present, although of course this has effects that will still need to be processed downstream because the resultant table will have empty topology_
entries.
There are also of course larger issues at play. I do concur with your general assumptions / visions up until now that the object
and object_link_edge
tables should generally accord. I do think this also makes sense in the osmdata_sc
context, and can mostly be reconciled by ensuring that every object_link_edge$object_
is also present in the object
table itself. This will not necessarily be so, because my osmdata_sc
vision to now has been of the object
table more as equivalent to a "feature" matrix so that, for example, SC(an_sfc_list)
would yield an empty object table. I can easily reconcile that with your vision by ensuring that nrow(SC(an_sfc_list)$object) == nrow(SC(an_sfc_list)$object_link_edge)
. This requires simply inserting some effectively empty entries in the object
table, much as the commit linked above does for SC0
.
It does not, however, solve a coupla larger issues.
osmdata_sc
will still generate a lot of duplicatedobject$object_
entries for reasons described in theosmdata_sc
issue, so in generalnrow(x$object)
will be quite a bit more thannrow(x$object_link_edge)
, and I don't see any way around that (allsphier
-like considerations and whatnot). That of course may make it tricky when joiningobject
andobject_link_edge
tables, but theobject
table could easily bespread
prior to that act, I'd suggest.- (This one is entirely my fault, no criticism at all, but ...) I still don't see the real logic behind
identical(SC(minimal_mesh), SC(minimal_mesh) %>% SC()) # FALSE
? I need that vision to really be able to work towards a solution here.
They are randomly generate UIDs, no two will ever be the same (though SC0 ought to be, with some caveats around coordinate precision/identity).
identical(SC(minimal_mesh), SC(minimal_mesh))
# [1] FALSE
set.seed(10)
x <- SC(minimal_mesh)
set.seed(10)
identical(x, SC(minimal_mesh))
[1] TRUE
Oh yeah, stupid me, of course. Ta! Oh no, but hey, this is going to affect the osmdata_sc
kind of stuff, because those IDs should be enduring and propagated throughout. I'll delve in myself there ...
There is a missing SC.SC <- function(x, ...) x
- the other models have that shortcut - but I guess you meant more generally as well.