Entry updates or errors in 2017 & 2018?
JacobWren opened this issue · 3 comments
Hello Dr. Kaplan,
Similar to issue #2 that I raised, there are additionally identical entries except with differing incident dates and incident date hours. Again, this only occurs only in the years 2017 and 2018. But it is not obvious which row to keep and which to drop. If it is an update then perhaps it makes sense to keep the row that comes later, as the prior row was likely left in by mistake. On the other hand the longer the time between incident dates (e.g., 2 months), the more likely it is an error, in which case these are separate incidents. Which case do you think is more likely?
By the way, I had this same issue in which there were identical entries except with differing years (not incident dates). I took this to be an error, not separate incidents, so I kept the earliest entry only which was equivalent to keeping the entry whose incident date year matched the year variable.
The update shouldn't change the incident date or the incident hour so it seems like it's an error. Maybe the agency copied an old report and only changed the data and time instead of changing other variables. I'd recommend deleting these rows.
@jacobkap Take the case where the rows are identical save the "incident date" variable (when this happens I always have two such rows). Further, suppose that the incident dates differ by three weeks. You would recommend deleting one row? But don't you think it is the likeliest case that these are indeed two distinct cases? In that case, I can simply tweak the "unique incident id" variable of one of them, keeping both rows. For reference the average difference between the incident date for the rows pairs that are identical save the "incident date" variable is just over 30 days.