openaddresses/pyesridump

Unable to dump source

Opened this issue · 8 comments

Trying to dump https://gismaps.sedgwickcounty.org/arcgis/rest/services/Map/Op_SiteAddress_Dynamic_SP/MapServer/0 returns this error:

2017-03-01 11:44:36,011 - cli.esridump - ERROR - Could not parse response from https://gismaps.sedgwickcounty.org/arcgis/rest/services/Map/Op_SiteAddress_Dynamic_SP/MapServer/0/query?returnCountOnly=true&where=1%3D1&f=json as JSON:

<html><head><title>Request Rejected</title></head><body>The requested URL was rejected. Please consult with your administrator.<br><br>Your support ID is: 11833783245905056836</body></html>

How does one go about dumping a source that is locked down tightly like this one? Is it possible to do with pyesridump as-is? If so, could documentation be added so people who aren't well-versed in arcgis/esri/whatever-term can try different approaches to dealing with problematic servers (this is my situation)?

It looks like their firewall/proxy is misconfigured. Their webmap has query functionality (that makes a similar request as us) that is failing right now because of this error you're seeing with pyesridump:

image

Okay, regardless of misconfigured firewall/proxy on their part, is there any way right now that pyesridump can scrape the data? I tried passing a custom WHERE but go the same thing:

$ esri2geojson -p "WHERE=OBJECTID > 1" https://gismaps.sedgwickcounty.org/arcgis/rest/services/Map/Op_SiteAddress_Dynamic_SP/MapServer/0 us-ks-sedgwick.geojson
2017-03-01 12:51:57,608 - cli.esridump - ERROR - Could not parse response from https://gismaps.sedgwickcounty.org/arcgis/rest/services/Map/Op_SiteAddress_Dynamic_SP/MapServer/0?WHERE=OBJECTID+%3E+1&f=json as JSON:

<html><head><title>Request Rejected</title></head><body>The requested URL was rejected. Please consult with your administrator.<br><br>Your support ID is: 11833783245905262104</body></html>

No, I don't think there's a way pyesridump can handle this server if it won't respond to the /query endpoint.

But you mentioned a brute force method... Did you try it on this layer and it worked?

I have seen this behavior before on Esri servers. It's pretty rare but does happen.

I can make this request as-is in both wget and chrome and get a JSON response:

https://gismaps.sedgwickcounty.org/arcgis/rest/services/Map/Op_SiteAddress_Dynamic_SP/MapServer/0/1?f=json

So I assume that by starting at 1 (or wherever) and just incrementing until the server returns a 400/404, the data can be scraped.

That will work for this layer because the first OID is 1 and the next one is 2, etc., but in lots of other layers I look at the first OID is not 1 and there are gaps between the OIDs (this is why I do the OID enumeration method), so this won't be repeatable.

The other annoying thing with this is that the Esri server won't do the reprojection like we do via /query for you when you query by OID.

I've thought about ways to handle id gaps/offset but until a problematic source comes up, simple incrementing will do for now.

latot commented

Hi, just passing by, is possible dump with identify too.