GreenInfo-Network/nyc-crash-mapper-etl-script

How to deal with historic updates to NYC Open Data?

clhenrick opened this issue · 1 comments

It appears that the vehicle collision data on the NYC Open Data Portal is edited after it is published. For example, previously published rows may be removed and values for columns in other rows may be updated, even for rows that date back over a year from the current date.

How do we know when previously published data is updated?

What is the best way to keep the table in CARTO in sync with the updated data?

  • Updates aren't as big of a deal as an UPSERT query can be used

  • Removing rows gets trickier, we can use the unique_key from Socrata, but will need to get values for every row in Socrata each time the database is updated. Could then do a DELETE with a WHERE unique_key NOT IN socrata_id.

What if a mistake is made by someone at Socrata (such as incorrect rows being deleted) and it should be avoided synced with the table at CARTO?

Update to 2016 data made manually