openaddresses/pyesridump

This query returns 26 duplicates for every feature

stevevance opened this issue · 3 comments

esri2geojson -f ICN,TotalInjured,OInjuries,AInjuries,BInjuries,CInjuries,CrashInjurySeverity,IsHitAndRun,ContribCausePrim,ContribCauseSec,CrashReportCity,CrashDateTimeText,TotalFatals,FunctionalClassCIS,TypeOfFirstCrash,IsAnyCitation,CrashVehicleCount,AgencyCrashReportNo,IsAlcoholRelated,CISCrashID -p "where=CrashReportCity%3D%27Chicago%27" http://ags10s1.dot.illinois.gov/ArcGIS/rest/services/SafetyPortal/SafetyPortal/MapServer/12 idotcrashes5.geojson
cli.esridump - INFO - Built 26 requests using OID enumeration method

This will return 26 records for each of 1,000 features. This should only return 12,552 unique records, according to a simple count using this ArcGIS server's web interface.

CISCrashID is the ObjectID for this table and this screenshot shows how this one crash ID appears 26 times in the completed GeoJSON file.

screenshot 2017-05-20 14 28 35

Overriding the "where" parameter means that esri2geojson can't do its job. I suppose we could add support for appending to the where clause, but for now you'll need to remove the -f "where... and do filtering later.

@iandees okay, but in this case I added the WHERE because the request timed out before it could grab everything. I was doing this, and setting the fields to grab, to reduce how much data the ArcGIS server needed to transfer.

Where in the code could I amend it to append the WHERE clause? here?

*Edit: I got the whole thing to download this time; maybe it was my network.

@stevevance if you pip install -U esridump in the virtualenv you created it should work.