yonghah/esri2sf

Allow outSR parameter to be changed in getESRIFeatureByIds

jacpete opened this issue · 3 comments

As of now, the outSR parameter in the query list creation is hard coded to be WGS1984 EPSG:4326. I am guessing this is legacy before the crs argument was added esri2sf. I believe this needs updated so that the output CRS can be specified correctly.

esri2sf/R/zzz.R

Lines 62 to 73 in cbcaf3b

getEsriFeaturesByIds <- function(ids, queryUrl, fields, token = "", ...) {
# create Simple Features from ArcGIS servers json response
query <- list(objectIds = paste(ids, collapse = ","), outFields = paste(fields,
collapse = ","), token = token, outSR = "4326", f = "json", ...)
responseRaw <- content(POST(queryUrl, body = query, encode = "form",
config = config(ssl_verifypeer = FALSE)), as = "text")
response <- fromJSON(responseRaw, simplifyDataFrame = FALSE, simplifyVector = FALSE,
digits = NA)
response$features
}

I am going to fork the repository and do some testing and may upload a pull request.

Essentially what I am seeing here is that getEsriFeaturesByIds() gets called when pulling the json data from the feature/map server. It is currently hardcoded for the EPSG:4326 (line 65 above). However the crs argument in `esri2sf() doesn't get used until the last call in esri2sf().

esri2sf/R/esri2sf.R

Lines 34 to 65 in cbcaf3b

esri2sf <- function(url, outFields = c("*"), where = "1=1", bbox = NULL, token = "",
geomType = NULL, crs = 4326, ...) {
layerInfo <- jsonlite::fromJSON(content(POST(url, query = list(f = "json",
token = token), encode = "form", config = config(ssl_verifypeer = FALSE)),
as = "text"))
print(layerInfo$type)
if (is.null(geomType)) {
if (is.null(layerInfo$geometryType))
stop("geomType is NULL and layer geometry type ('esriGeometryPolygon' or 'esriGeometryPoint' or 'esriGeometryPolyline') could not be inferred from server.")
geomType <- layerInfo$geometryType
}
print(geomType)
if (!is.null(layerInfo$extent$spatialReference$latestWkid))
print(paste0("Coordinate Reference System: ", layerInfo$extent$spatialReference$latestWkid))
if (class(bbox) == "bbox") {
if ((sf::st_crs(bbox)$input != layerInfo$extent$spatialReference$latestWkid) && !is.null(layerInfo$extent$spatialReference$latestWkid)) {
bbox <- sf::st_bbox(sf::st_transform(sf::st_as_sfc(bbox), layerInfo$extent$spatialReference$latestWkid))
}
} else if (!is.null(bbox)) {
stop("The provided bbox must be a class bbox object.")
}
bbox <- paste0(unlist(as.list(bbox), use.names=FALSE), collapse = ",")
queryUrl <- paste(url, "query", sep = "/")
esriFeatures <- getEsriFeatures(queryUrl, outFields, where, bbox, token, ...)
esri2sfGeom(esriFeatures, geomType, crs)
}

This is an issue because we are forcing the data to come in as EPSG:4326 but then we allow the data to be defined as any CRS given to the crs argument in esri2sfGeom().

esri2sf/R/zzz.R

Lines 119 to 134 in cbcaf3b

esri2sfGeom <- function(jsonFeats, geomType, crs = 4326) {
# convert esri json to simple feature
geoms <- switch(geomType,
esriGeometryPolygon = esri2sfPolygon(jsonFeats),
esriGeometryPoint = esri2sfPoint(jsonFeats),
esriGeometryPolyline = esri2sfPolyline(jsonFeats)
)
# attributes
atts <- lapply(lapply(jsonFeats, `[[`, 1),
function(att) lapply(att, function(x) ifelse(is.null(x), NA, x)))
af <- dplyr::bind_rows(lapply(atts, as.data.frame.list, stringsAsFactors = FALSE))
# geometry + attributes
st_sf(geoms, af, crs = crs)
}

I can see this leading to some unexpected errors when trying to define a CRS. I believe sf::st_sf() only directly sets a CRS for a layer and does not attempt a transformation.

I also see another (more opinionated) issue with using EPSG:4326 as the default because it means the user of the package needs to be aware that they may be introducing datum transformation shift error into their dataset if the Feature/Map Service has the data stored in a different datum than WGS1984. While this error is often relatively low (the example I often encounter is WGS1984 to NAD1983 with an error of about 2.5 feet) it is unacceptable in some cases. I see two paths to fix this.

  1. Add documentation pointing users to consider using the datumTransformation query parameter (see https://developers.arcgis.com/rest/services-reference/enterprise/query-map-service-layer-.htm). However, I find this documentation a pretty annoying to use/search and I (IMHO) don't prefer this option. It adds a lot more work to the user to know what they are doing here.
  2. The other option I see would lead to a major change in current functionality. Instead of specifying the outSR query parameter by default and thereby defaulting to EPSG:4326 you would attempt to create the feature with Feature/Service Layers spatial reference. This in my experience is always specified by a WKTID but I think it is possible for it to be a WKT specification (https://developers.arcgis.com/javascript/3/jsapi/spatialreference-amd.html). This could be accomplished by making the default crs argument be NULL and making a few changes to the esri2sfGeom() and getEsriFeaturesByIds() functions. The downside to this is it may break users current code if they expect the data to come back in EPSG:4326. Once the data is in R they can do the transformation with correct datum shift if needed or they could still specify crs = 4326 in the function explicitly.

I am going to work on coding up a pull request for option 2 for a functional example and we can discuss whether or not it should be accepted and/or what the default should be. I guess we could always make the default be crs = 4326 but still add in the crs = NULL functionality and documentation depending on your preferences.

So I've been working on a pull request for this issue and I am hitting a small snag. For the outSR parameter in getEsriFeaturesByIds() ESRI's query format (https://developers.arcgis.com/rest/services-reference/enterprise/query-feature-service-layer-.htm) requires the input to be either a WKTid or spatial reference JSON object that can be formatted to contain an ESRI wkt. In my testing and research, the ESRI wkt is different than the normal OGR WKT generally given by R, but GDAL does have the power to convert to it. What I am looking for now is a way to interface with the GDAL library linked in R through the {sf} package. On linux this is pretty easy because GDAL is compiled and installed separately from the sf install, but on Windows (and I think OSX but I don't run this OS to test) the GDAL library is compiled and installed with the {sf} binaries and it makes it difficult to interface with the GDAL library directly.

Without a way to link to GDAL, we will be limited in our inputs for the crs argument. We will have to restrict it to only using WKTids These can be either EPSG or ESRI authority WKTids though as the ESRI query format seems to be able to handle either without direct specification. However, sf::st_sf() does require direct specification so I added a helper (hidden) function getWKTidAuthority to do a direct query to the proj.db SQLite database in the PROJ library linked to by the {sf} package. This function adds imported dependencies for the {DBI} and {RSQLite} packages which were imported or suggested by {sf} already.

https://github.com/jacpete/esri2sf/blob/62c6e8a3368d1fe9bfe24e400110d5e8606e3fe8/R/zzz.R#L149-L172

To extend the function to be able to accept any crs format you can generally use in R (e.g., proj4string, WKT, numeric EPSG) there will need to be an interface directly to the GDAL gdalsrsinfo command line tool (I think it may be easiest to create and issue in {sf} to extend sf::gdal_utils() for this one; https://gdal.org/programs/gdalsrsinfo.html) or improve the {Rccp} interface to exportToWkt C++ API (https://gdal.org/api/ogrspatialref.html#_CPPv4NK19OGRSpatialReference11exportToWktEPPcPPCKc) with an option to specify the FORMAT parameter and create an exported R function to control it. While I think the implementation of the first option may allow slightly easier access to the functionality, I think the addition of the second may already be being discussed by package authors (see comment on line 115 of code from {sf} below.

https://github.com/r-spatial/sf/blob/405ca1ca4ebdbd5672948aea88251e5d12004740/src/gdal.cpp#L115-L128

Thank you for the detailed comments and the PR. I hope the sf package will address this issue as well.