hypertidy/silicate

fix ARC

Closed this issue ยท 14 comments

ARC only works properly for polygons, or line sets that actually join as a ring.

Need multiple overlaps handle properly as well.

  • install example data
  • ensure sc_node works for SC
  • ensure sc_arc works for SC
  • (consider) reimpliment the PATH methods from SC
  • clarify the the silicate-view of this
library(stplanr)
library(sf)
x <- routes_fast_sf[as.numeric(st_length(x)) > 0, "ID"]
plot(x, lwd = nrow(x):1)  ## lots of complicated overlap

# 2975 instances
nrow(st_coordinates(x))
library(silicate)
sc <- SC(x)
# 830 unique vertices, 858 unique edges
#class       : SC
#type        : Primitive
#vertices    : 830 (2-space)
#primitives  : 858 (1-space)
#crs         : +proj=longlat +datum=WGS84 +no_defs

# 2926 instances of edges
sc$object_link_edge

library(dplyr)
sc$object_link_edge %>% group_by(edge_) %>% summarize(nlaps = n()) %>% group_by(nlaps) %>% tally()

## between 1 and 14 instances (no 9 or 13)
# A tibble: 12 x 2
#     nlaps     n
#     <int> <int>
# 1     1   299
# 2     2   156
# 3     3    47
# 4     4   150
# 5     5    25
# 6     6    78
# 7     7     8
# 8     8    41
# 9    10    23
# 10   11    15
# 11   12    11
# 12   14     5

With @Robinlovelace's stplanr issue and the osmdata_sc() function we've got plenty of fodder to aid resolution of this . I guess this means I should start impelementing osmdata_sc (x, sc_verb = "ARC")? (see this issue

The crux thing that's required in my opinion is a "path finder" from arbitrary edge sets. If we SC a data set, the nodes are then the vertices with "3 or more edge-connections", i.e.

sc_node.SC <- function(x, ...) {
  alledge <- tibble::tibble(v = c(x$edge$.vx0, x$edge$.vx1),
                            e = c(x$edge$edge_, x$edge$edge_))
  v0 <- alledge %>% group_by(v) %>% tally() %>% filter(n > 2)
  tibble::tibble(vertex_ = sort(unique(v0$v)))
}

and from that basis you then identify the paths that link between those nodes. ARC() is already doing this properly for polygon layers, but from the SC model we really need those paths to be reconstructed from raw edges. All the rest is joins and so forth.

That's the dodgr graph contraction algorithm, the complication there being directed edges, which mean n >2 only for one-way edges, but either n>3 or n>4 for other cases. It's more complicated than I had thought before we started implementing it.

Edit: oh, and 2 only works like that when directions are opposing. So the bigger discussion I'd like to have is about compatability of directed graphs with SC in general

Ah indeed indeed, !n == 2.

I feel that directed stuff can come out later - I get mighty confused about it, but I think the way I do it in SC now is helpful. You have to normalize all edges (remove duplicated instances) and you do that by parallel sort, but also record each instance of the edge and what its orientation was - either oriented as the edge is now (.vx0, .vx1) native_ = TRUE or not (.vx1, .vx0) native_ = FALSE. So in a polygon layer there's either 1 or 2 instances, and if two you have one native_ and one not native_. In a complexly layered data set (like routes_fast_sf) there are just more instances - and some are native orientation, some aren't - or they are explicitly orientated by values stored on them, but there's a "royal" instance on the edge table, and maybe there should be a rule about what the royal orientation is (but then the geometry can change arbitrarily, and orientation can be re-inferred anyway so not sure that's important).

Then directed edges guide the question of which instances you keep, and what values you store with them - but the graph itself is agnostic (or arbitrary) to any particular orientation.

(SF can't do this at all, and geometry algorithms return in some y then x, or plane-sweep order to the frustration of many who want a line to run a particular way)

"edge_id" in dodgr_contract_graph identifies instances, yeah?

Here's a rough stab, I understand dodgr better now!

library(dodgr)

to_dodgr <- function(x, ...) {
  UseMethod("to_dodgr")
}
to_dodgr.SC <- function(x, ...) {
  x$object_link_edge %>% dplyr::inner_join(x$edge) %>% 
    dplyr::transmute(from_id = ifelse(native_, .vx0, .vx1), 
                     to_id = ifelse(native_, .vx1, .vx0), d = 1, 
                     edge_id = row_number(), edge_orig = edge_)
}
sc <- SC(routes)

instances <- to_dodgr(sc)
cg <- dodgr_contract_graph(instances)


als <- cg$edge_map  %>%  split(.$edge_new) %>% purrr::map(~ 
                          dplyr::inner_join(.x, instances %>% mutate(edge_id = as.character(edge_id)), c("edge_old" = "edge_id")) 
                                                  )
library(dplyr)
expand_verts <- function(a) {
  a %>% inner_join(sc$vertex, c("from_id" = "vertex_")) %>% dplyr::rename(x0 = x_, y0 = y_) %>% 
  inner_join(sc$vertex, c("to_id" = "vertex_"))
}
plot(sc)
for (i in seq_along(als)) {
  tab <- expand_verts(als[[i]])  
  segments(tab$x0, tab$y0, tab$x_, tab$y_, col = sample(rainbow(length(als)), 1), lwd = 4)
}

points(sc_node(sc) %>% inner_join(sc$vertex) %>% select(x_, y_))

It's not getting every arc, and other scary issues:

  • I might be in trouble using integer IDs, where dodgr is converting to character (see my casts above)
  • don't give dodgr_contract_graph the edge table, it crashes (i.e. from_id = .vx0, to_id = .vx1 - that probably needs catching)
  • I lost sight of the fact that my edge_id is an instance id, so there's a join back to the unique edge, it works it's just confusing with all the names
  • I haven't checked anything except via this plot

image

Oh that's all grand, but just one clarification, if I may... You mean (.vx0,.vx1, _native=T) == (.vx1,.vx0, _native=F) and (.vx0,.vx1, _native=F) == (.vx1,.vx0, _native=T), but not the other way round? That would indeed allow immediate expression of dodgr-type structures in these terms. Coolio!

Edit: can we use _dir instead of _native? I've just got a general aversion to the latter term. Or do you envision it being more than just _dir? Do or will other tables have comparable columns? And if so, what might be some use cases for such?

I don't totally get your "You mean ..." sentence.

When SC0 is run, its topology_ column has .vx0 and .vx which are the native order of edge pairs from the input data (sf polygons or lines). The vertices are unique, but these edge pairs obviously are not - every object has its own copy of any particular vertex-pair index.

When SC is run, those pairs are also made unique - by using pmin/pmax https://github.com/hypertidy/silicate/blob/master/R/SC-model.R#L50-L51 - and so we can remove repeated instances (they are all "undirected" in one sense). But the object_link_edge table records those original instances and whether the native index in edge matches the orientation they came in with.

To me it's a way of recording the way the data came up, so it's reconstructable up to this point. As we filter and modify edges obviously we drift away from the original data. I'm not sure yet how traditional directed/undirected graphs should be seen in this way. Does that make sense?

Bingo!

library(dodgr)

to_dodgr <- function(x, ...) {
  UseMethod("to_dodgr")
}
# to_dodgr.SC <- function(x, ...) {
#   x$object_link_edge %>% dplyr::inner_join(x$edge) %>% 
#     dplyr::transmute(from_id = ifelse(native_, .vx0, .vx1), 
#                      to_id = ifelse(native_, .vx1, .vx0), d = 1, 
#                      edge_id = row_number(), edge_orig = edge_)
# }


# to_dodgr.SC <- function(x, ...) {
#   dplyr::bind_rows(x$object_link_edge %>% dplyr::inner_join(x$edge) %>% 
#     dplyr::transmute(from_id = .vx0,  to_id = .vx1), 
#     x$object_link_edge %>% dplyr::inner_join(x$edge) %>% 
#       dplyr::transmute(from_id = .vx1, 
#                        to_id = .vx0)
#   ) %>%  dplyr::mutate(d = 1, edge_id = row_number())
# }
to_dodgr.SC <- function(x, ...) {
  edge <- x$edge %>% transmute(from_id = .vx0, to_id = .vx1, edge_)
  edge2 <- tibble(from_id = edge$to_id, to_id  = edge$from_id, edge_ = edge$edge_)
  bind_rows(edge, edge2) %>% mutate(edge_id = row_number(), d = 1)
}

sc <- SC(routes)

instances <- to_dodgr(sc)
cg <- dodgr_contract_graph(instances)


als <- cg$edge_map  %>%  split(.$edge_new) %>% purrr::map(~ 
                                                            dplyr::inner_join(.x, instances %>% mutate(edge_id = as.character(edge_id)), c("edge_old" = "edge_id")) 
)
library(dplyr)
expand_verts <- function(a) {
  a %>% inner_join(sc$vertex, c("from_id" = "vertex_")) %>% dplyr::rename(x0 = x_, y0 = y_) %>% 
    inner_join(sc$vertex, c("to_id" = "vertex_"))
}
#plot(sc$vertex[1:2])
plot(sc)
cols <- viridis::viridis(length(als))
for (i in seq_along(als)) {
  tab <- expand_verts(als[[i]])  
  segments(tab$x0, tab$y0, tab$x_, tab$y_, col = sample(cols, 1), lwd = 4)
}


points(sc_node(sc) %>% inner_join(sc$vertex) %>% select(x_, y_))

It's a mess, but I have to leave this for now - just had to try that! (I put a 'routes' object in silicate, a filtered copy of routes_fast_sf - no 0-length lines).

image

I think it's fair to say the routes dataset has been shredded in ways its creators never conceived of. Great work by the looks of it. And heads-up @richardellison. Guess you'll be interested in this.

This is a simpler way to identify shared vertices from terminal vertices. (Current ARC is failing to find terminal vertices, as it was premised around polygons. )

library(sf)
sm_routes <- sf::st_as_sf(rmapshaper::ms_simplify(as(routes, "Spatial")))
sfx <- sm_routes[c(25, 27), ]
sfx <- minimal_mesh
sc <- SC0(sfx)
#sc <- SC0(silicate:::minimal_line)
## on creation, SC0 has all edge instances in the order they occur
sc$object$object_ <- seq_len(nrow(sc$object))
edge <- tidyr::unnest(sc$object[c("object_", "topology_")])


## nodes that occur twice or more
shared_verts <- edge$.vx0[which(!match(edge$.vx0, edge$.vx1) == (seq_len(nrow(edge)) - 1))]
## nodes that occur only once (for lines)
terminal_verts <- tibble::tibble(vertex_ = as.vector(t(as.matrix(edge[c(".vx0", ".vx1")])))) %>% 
  dplyr::count(.data$vertex_) %>% dplyr::filter(n == 1) %>% dplyr::pull(.data$vertex_)

plot(sfx, lwd = c(10, 2), reset = FALSE)
if (length(shared_verts) > 0) points(sc$vertex[shared_verts, c("x_", "y_")], col = "red", cex = 1.5)
if (length(terminal_verts) > 0) points(sc$vertex[terminal_verts, c("x_", "y_")], col = "green", cex = 1, pch = 19)

It works for various line overlaps and for polygons. We need to be able to also identify when a shared vertex is (only?) shared by an enclosing path, then that node-classification is enough to rebuild a sensible ARC/ARC0.

ARC print is wrong

x <- structure(list(TRANSEG_ID = c(6084063L, 4296322L, 4299053L, 6169126L, 
6084048L, 6168802L, 6169130L), TRANS_TYPE = c("Road", "Road", 
"Road", "Road", "Road", "Road", "Road"), TSEG_FEAT = c("Normal Transport Segment", 
"Intersection Segment", "Normal Transport Segment", "Normal Transport Segment", 
"Normal Transport Segment", "Intersection Segment", "Dual Carriageway"
), SHAPE = structure(list(structure(list(structure(c(525974.353, 
525941.38, 5251023.617, 5251011.371), .Dim = c(2L, 2L))), class = c("XY", 
"MULTILINESTRING", "sfg")), structure(list(structure(c(525974.353, 
525978.84, 525984.763, 525991.963, 525997.328, 525998.134, 5251023.617, 
5251023.215, 5251020.462, 5251013.262, 5251005.962, 5251004.866
), .Dim = c(6L, 2L))), class = c("XY", "MULTILINESTRING", "sfg"
)), structure(list(structure(c(526019.274, 525974.353, 5251050.548, 
5251023.617), .Dim = c(2L, 2L))), class = c("XY", "MULTILINESTRING", 
"sfg")), structure(list(structure(c(525974.353, 525969.462, 525963.162, 
525960.884, 5251023.617, 5251033.762, 5251044.663, 5251048.042
), .Dim = c(4L, 2L))), class = c("XY", "MULTILINESTRING", "sfg"
)), structure(list(structure(c(525960.884, 525957.162, 525929.042, 
5251048.042, 5251053.563, 5251090.63), .Dim = 3:2)), class = c("XY", 
"MULTILINESTRING", "sfg")), structure(list(structure(c(525941.38, 
525949.487, 525956.543, 525959.671, 525961.378, 525961.803, 525961.089, 
525960.884, 5251011.371, 5251018.537, 5251024.258, 5251028.506, 
5251032.895, 5251037.627, 5251045.204, 5251048.042), .Dim = c(8L, 
2L))), class = c("XY", "MULTILINESTRING", "sfg")), structure(list(
    structure(c(525984.056, 525982.162, 525978.962, 525974.862, 
    525974.353, 5250981.35, 5250996.662, 5251010.562, 5251022.562, 
    5251023.617), .Dim = c(5L, 2L))), class = c("XY", "MULTILINESTRING", 
"sfg"))), class = c("sfc_MULTILINESTRING", "sfc"), precision = 0, bbox = structure(c(xmin = 525929.042, 
ymin = 5250981.35, xmax = 526019.274, ymax = 5251090.63), class = "bbox"), crs = structure(list(
    epsg = 28355L, proj4string = "+proj=utm +zone=55 +south +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs"), class = "crs"), n_empty = 0L)), row.names = c(NA, 
-7L), sf_column = "SHAPE", agr = structure(c(TRANSEG_ID = NA_integer_, 
TRANS_TYPE = NA_integer_, TSEG_FEAT = NA_integer_), .Label = c("constant", 
"aggregate", "identity"), class = "factor"), class = c("sf", 
"tbl_df", "tbl", "data.frame"))

This should be 7 paths, 30 coordinates.

ARC(x)
class       : ARC
type        : Sequential
vertices    : 23 (2-space)
paths       : 4 (1-space)
coordinates : 22

Dump ARC except for the working polygon case

done, this needs a revisit one fine day