yonghah/esri2sf

`esriUrl_isValid()` returning error for working MapServer URL

elipousson opened this issue · 8 comments

It looks like the new URL checks added to this package may be too strict. With the current version of esri2sf, I'm getting the following error for this MapServer URL: https://egisdata.baltimorecity.gov/egis/rest/services/CityView/Liquor_Licenses/MapServer/0

Error in value[[3L]](cond): 
Url is not a valid ESRI Map or Feature Service Url.
Could not access url with {httr}.

I confirmed that the URL works fine with the version of esri2sf that preceded the addition of the esriUrl_isValid() function.

@jacpete - would you mind taking a look at this since you contributed the new functionality? Thanks!

Looking into it now

#37 should fix the issue. I will add an issue to the httr package to see why we were getting that error as per their documentation it seems as if it should have worked. Let me know if you have any other issues.

Started issue in r-lib/httr#707 to ask for some clarification.

Thanks so much for responding so quickly!

Seems the issue has to do with the fact that httr::http_error() uses HEAD() under the hood to get the status instead of a GET() and that particular server likely has HEAD requests disabled. See my comment on the httr issue here if curious: r-lib/httr#707 (comment)

httr::HEAD("https://egisdata.baltimorecity.gov/egis/rest/services/CityView/Liquor_Licenses/MapServer/0")
# Response [https://egisdata.baltimorecity.gov/egis/rest/services/CityView/Liquor_Licenses/MapServer/0]
#   Date: 2021-10-27 18:50
#   Status: 405
#   Content-Type: text/html;charset=utf-8
# <EMPTY BODY>
httr::http_status(httr::HEAD("https://egisdata.baltimorecity.gov/egis/rest/services/CityView/Liquor_Licenses/MapServer/0"))
# $category
# [1] "Client error"
# 
# $reason
# [1] "Method Not Allowed"
# 
# $message
# [1] "Client error: (405) Method Not Allowed"

One question: would the work around you mention in the issue description for httr cause problems if the FeatureServer has a very large number of features? Baltimore City switched the city open data platform from Socrata to ArcGIS (with very mixed results IMHO) so the datasets are often pretty hefty.

I don't think so because the httr::http_error(httr::GET(url)) sequence is only used in the esriUrl_isValid() function which expects a url like https://egisdata.baltimorecity.gov/egis/rest/services/CityView/Liquor_Licenses/MapServer or https://egisdata.baltimorecity.gov/egis/rest/services/CityView/Liquor_Licenses/MapServer/0 as the input:

esri2sf/R/esriUrl.R

Lines 85 to 115 in ee22d2d

esriUrl_isValid <- function(url, displayReason = FALSE) {
# check url succeeds
urlError <- tryCatch({
httr::http_error(httr::GET(url))
}, error = function(cond) {TRUE})
if (!grepl("/rest/services", url)) {
reason <- "'/rest/services' not found in the url."
out <- FALSE
} else if (!grepl("/MapServer|/FeatureServer", url)) {
reason <- "'/MapServer' or '/FeatureServer' not found in the url."
out <- FALSE
} else if (!grepl("/MapServer$|/FeatureServer$|/[[:digit:]]+$", url)) {
reason <- "Url does not end in '/MapServer' or '/FeatureServer' or a layer/table ID."
out <- FALSE
} else if (urlError) {
reason <- "Could not access url with {httr}."
out <- FALSE
} else if (!is.na(rvest::html_element(rvest::read_html(url), 'div.restErrors'))) {
reason <- sub("^[[:space:]]*", "", rvest::html_text(rvest::html_element(rvest::read_html(url), 'div.restErrors')))
out <- FALSE
} else {
out <- TRUE
}
if (!out & displayReason) {
message(paste0("Url is not a valid ESRI Map or Feature Service Url.\n", reason))
}
return(out)
}

These are for checks done at the beginning of the esri2sf, esri2df, and esrimeta functions currently:

esri2sf/R/esri2sf.R

Lines 37 to 47 in ee22d2d

esri2sf <- function(url, outFields = c("*"), where = "1=1", bbox = NULL, token = "",
geomType = NULL, crs = 4326, progress = FALSE, replaceDomainInfo = TRUE, ...) {
#make sure url is valid and error otherwise
tryCatch(
{
esriUrl_isValidID(url, displayReason = TRUE)
}, message = function(m) {
stop(m$message)
}
)

esri2sf/R/esri2sf.R

Lines 101 to 110 in ee22d2d

esri2df <- function(url, outFields = c("*"), where = "1=1", token = "", progress = FALSE, replaceDomainInfo = TRUE, ...) {
#make sure url is valid and error otherwise
tryCatch(
{
esriUrl_isValidID(url, displayReason = TRUE)
}, message = function(m) {
stop(m$message)
}
)

esri2sf/R/esri2sf.R

Lines 130 to 139 in ee22d2d

esrimeta <- function(url, token = "", fields = FALSE) {
#make sure url is valid and error otherwise
tryCatch(
{
esriUrl_isValid(url, displayReason = TRUE)
}, message = function(m) {
stop(m$message)
}
)

These aren't the modified urls where it is actually requesting data so the GET request should only be returning the html page associated with the MapServer, FeatureServer, or feature ID page that you would see when you clicked one of the links above. I haven't seen a use case where any of these landing pages would be very large even if it is a feature id page with a lot of fields listed. But if you know of an example where this could be an issue by all means bring it to my attention and we can look for another solution.

Cool. Thanks for clarifying.