ropensci/geojsonio

as.json "arcs" seems to be missing set of brackets

osserman opened this issue · 10 comments

Session Info
R version 3.5.1 (2018-07-02)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS 10.15.2

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] geojsonio_0.8.0 sp_1.3-1 RPostgreSQL_0.6-2 DBI_1.0.0.9001

loaded via a namespace (and not attached):
[1] remotes_2.0.0 sf_0.6-3 lattice_0.20-35
[4] testthat_2.0.0 V8_2.0 usethis_1.4.0
[7] yaml_2.2.0 base64enc_0.1-3 rlang_0.4.0
[10] pkgbuild_1.0.2 geojsonlint_0.3.0 e1071_1.7-0
[13] pillar_1.4.2 glue_1.3.1 foreign_0.8-70
[16] httpcode_0.2.0 withr_2.1.2 sessioninfo_1.1.0
[19] rgeos_0.5-2 devtools_2.0.0 memoise_1.1.0
[22] callr_3.0.0 maptools_0.9-8 ps_1.1.0
[25] curl_4.0 class_7.3-14 Rcpp_1.0.2
[28] readr_1.1.1 backports_1.1.4 classInt_0.2-3
[31] jsonvalidate_1.0.0 desc_1.2.0 pkgload_1.0.1
[34] jsonlite_1.6 fs_1.2.6 hms_0.4.2
[37] digest_0.6.20 stringi_1.4.3 processx_3.2.0
[40] grid_3.5.1 rprojroot_1.3-2 jqr_1.1.0
[43] rgdal_1.4-3 cli_1.1.0 tools_3.5.1
[46] magrittr_1.5 lazyeval_0.2.1 geojson_0.3.2
[49] tibble_2.1.3 crul_0.7.0 crayon_1.3.4
[52] pkgconfig_2.0.2 prettyunits_1.0.2 spData_0.2.9.4
[55] getcartr_1.01 assertthat_0.2.1 httr_1.4.0
[58] rstudioapi_0.7 R6_2.4.0 units_0.6-0
[61] compiler_3.5.1

Hi all. Thanks for the great package!

I've been running into problems when converting from topojson_list to topojson_json via as.json(). I think I've tracked down the issue, which is in the case of single-arc polygons as.json(topojson_list) seems to produce topojson that is missing a level of nested brackets in the objects.foo.geometries[[n]].arcs list. I ran into this initially with a handful multipolygons in my file that contain single-arc polygons (damn islands!).

Here's some (hopefully) reproducible code to show the behavior:

spdf1 <- SpatialPolygonsDataFrame(
  SpatialPolygons(list(
    Polygons(list(
      Polygon(cbind(x = c(2,2,3,2), y = c(2,3,2,2))),
      Polygon(cbind(x = c(1,2,2,1), y = c(4,4,5,4)))
    ), ID = 1)
  )),
  data = data.frame(a = 1)
)

output of:
topojson_json(spdf1)
{"type":"Topology","objects":{"foo":{"type":"GeometryCollection","geometries":[{"type":"MultiPolygon","arcs":[[[0]],[[1]]],"id":1,"properties":{"a":1}}]}},"arcs":[[[2,2],[2,3],[3,2],[2,2]],[[1,4],[2,4],[2,5],[1,4]]],"bbox":[1,2,3,5]}

output of:
as.json(topojson_list(spdf1))
{"type":"Topology","objects":{"foo":{"type":"GeometryCollection","geometries":[{"type":"MultiPolygon","arcs":[[0],[1]],"id":1,"properties":{"a":1}}]}},"arcs":[[[2,2],[2,3],[3,2],[2,2]],[[1,4],[2,4],[2,5],[1,4]]],"bbox":[1,2,3,5]}

The "arcs":[[0],[1]] compared to "arcs":[[[0]],[[1]]] in the former seems to be what is making my topojson interpret incorrectly in d3.

After writing this I got curious about single-arc simple polygons and it seems like there's the same problem there as well. With "arcs":[0] instead of "arcs":[[0]]

spdf2 <- SpatialPolygonsDataFrame(
  SpatialPolygons(list(
    Polygons(list(
      Polygon(cbind(x = c(2,2,3,2), y = c(2,3,2,2)))
    ), ID = 1)
  )),
  data = data.frame(a = 1)
)
topojson_json(spdf2)
as.json(topojson_list(spdf2))

Thanks in advance!

thanks for this @osserman !

So is the output of topojso_json correct? And the output of topojson_list is incorrect?

as.json calls an internal fxn to_json, which is really just a call to jsonlite::toJSON. If we do

library(geojsonio)
library(sp)
spdf1 <- SpatialPolygonsDataFrame(
  SpatialPolygons(list(
    Polygons(list(
      Polygon(cbind(x = c(2,2,3,2), y = c(2,3,2,2))),
      Polygon(cbind(x = c(1,2,2,1), y = c(4,4,5,4)))
    ), ID = 1)
  )),
  data = data.frame(a = 1)
)
z <- topojson_list(spdf1)

# same as as.json() gives you
jsonlite::toJSON(z, digits = 7, auto_unbox = TRUE, force = TRUE)
#> {"type":"Topology","objects":{"foo":{"type":"GeometryCollection","geometries":[{"type":"MultiPolygon","arcs":[[0],[1]],"id":1,"properties":{"a":1}}]}},"arcs":[[[2,2],[2,3],[3,2],[2,2]],[[1,4],[2,4],[2,5],[1,4]]],"bbox":[1,2,3,5]}

# setting auto_unbox=FALSE
jsonlite::toJSON(z, digits = 7, auto_unbox = FALSE, force = TRUE)
#> {"type":["Topology"],"objects":{"foo":{"type":["GeometryCollection"],"geometries":[{"type":["MultiPolygon"],"arcs":[[[0]],[[1]]],"id":[1],"properties":{"a":[1]}}]}},"arcs":[[[2,2],[2,3],[3,2],[2,2]],[[1,4],[2,4],[2,5],[1,4]]],"bbox":[1,2,3,5]}

Ah -- cool, sorry I didn't follow as.json() to it's source. But yeah topojson_json() is producing json that is interpretable in d3 for me, and as.json(topojson_list()) is what fails. Just tried a few things, and confirming that for topojson_json to be interpretable (at least in my case, but I think more broadly) almost all elements in the topojson_list needs to be unboxed; except the array of arcs for each feature, which need to remain boxed.

it's a bit complicated to automatically make sure arcs don't get mis-handled because they can be nested within a list.

Curious, why are you going topojson_list to as.json? Why not just topojson_json? Are you making a list to edit that list, then convert to json?

Yeah, I'm testing some simplification approaches that happen on the topojson-arc level. Doing that in topojson_list form is the most convenient / only way I've figured out how to implement.

But just came up with a workaround that at least works for my use case. (Details below if anyone else is curious / stumbles on the same issue).

In terms of a more stable / general solution, happy to help if there's some way I could be useful -- let me know! And in the meantime thanks for helping me understand the problem enough to get me to the workaround below,
Stephen

Details of my current workaround:

For my purposes, I'm just editing the topojson_list$arcs -- not other elements of the topojson_list ($type, $objects, or $bbox). The as.json() issue with auto-unboxing only happens to the $arcs element of each object's geometries not the actual coordinates of the arcs themselves in topojson_list$arcs.

So I can use a regex to splice the valid as.json(simplified_topojson_list$arcs) into the valid topojson_json(original_shapefile) like this:

arcs_simpified <- paste0('"arcs":' , as.json(topojson_list_simpified$arcs))
spliced_topojson <- sub('("arcs)(?!.*\\1).*]]]', arcs_simpified, topojson_json(original_shp), perl = T) 
class(spliced_topojson) <- c('json')

Glad you sorted out a solution that works.

in terms of changes in this package for this issue, there would need to be a solution that would be fast and no chance of failing, perhaps some solution walking down nested lists

I finally had a moment to revisit this in search of a faster and more general solution. This is working for my use-cases and isn't appreciably slower than the current as.json() function.

I played with at a few potential recursive solutions that imitate the logic of toJSON more closely, but ended up with this approach instead -- which first toJSON()'s any $arcs in each of the $geometries with auto_unbox = FALSE - leaving the structure of topojson_list otherwise intact; then toJSON()'s the whole topojson_list with json_verbatim = TRUE to make sure it doesn't treat the $arcs - which are now json - as strings.

This obviously assumes relatively consistency in the topojson_list structure -- specifically that geometries are always stored in the first list element of objects. That seemed to be the case for the examples I could find, but not sure if I'm missing something or if this is too brittle an approach.

Anyway, offering it as a better workaround than my last one if not the seed of a change to the package for the issue.

as.json.topojson_list <- function(x){
  for(i in seq_along(x$objects[[1]]$geometries)){
    if('arcs' %in% names(x$objects[[1]]$geometries[[i]])){
      x$objects[[1]]$geometries[[i]]$arcs <- jsonlite::toJSON(x$objects[[1]]$geometries[[i]]$arcs, digits = 7, auto_unbox = FALSE, force = TRUE)
    }
  }
  return(jsonlite::toJSON(x,digits = 7, auto_unbox = TRUE, force = TRUE, json_verbatim = TRUE))
}

I checked microbenchmarks for very simple polygon and point examples and also for a topojson_list built from a larger SpatialPolygonDataFrame (1.5 MB, 117 MultiPolygons). All of the tests looked fine. Here's the microbenchmark for the larger one:

Unit: seconds
                     expr      min       lq     mean   median       uq      max neval
 as.json.topojson_list(x) 1.105342 1.153443 1.271643 1.192843 1.278742 2.510970   100
    geojsonio::as.json(x) 1.104114 1.137936 1.272738 1.166760 1.242438 3.027229   100

thanks and sorry for the delay. topojson_list() doesn't have a s3 class attached to it, so we can't dispatch on the as.json method. but we could maybe itnernally within as.json detect if the obect is likely topojson, and if so, use your code above. does that sound good?

I'm incredibly sorry for the delayed response @sckott . Totally lost this amidst the pandemic. That solution sounds great to me if still relevant / doable. Thanks!

Okay, yes still doable. I'll take a look at it soon

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.