invinst/chicago-police-data

Assign all POs in all_sworn a unique ID?

Closed this issue · 6 comments

DGalt commented

I'm currently working on putting together data sets for the individuals in the May and April dumps that I can confidently ID as either police officers or not police officers, and I'm finding that there currently is no good way to uniquely ID a particular officer. Their name alone isn't enough, so I end up having to use a combination of different data sources to ID them.

This is fine, except when I want to go back and look at that officer again (or match him/her in another data set again) I need to once again use those different sources to ID him.

It might be worth considering assigning all of the officers in the all_sworn data set some kind of unique ID so that when I identify someone in the April and May data set as one particular officer, I can assign that entry that unique ID. It would make cross referencing these different data sets easier I think. @rajivsinclair I know that we don't have employee IDs, but have you all discussed assigning some kind of equivalent ID # to the entries in all sworn for this purpose?

Not sure if this helps, in the Feb data, it looks like the officers have a unique PERS_STAR_NO code here [https://github.com/invinst/shootings-data/blob/master/Clean/Feb2016/dat_feb2016_officer.csv]

DGalt commented

This is equivalent to Star1, Star2, etc. in all_sworn I'm assuming? The problem with the star numbers is that they can potentially change over time / be reused - right @rajivsinclair?

Yeah, their star numbers change when they get promoted (or demoted) and get a new badge. This blog explains the star number ranges for different ranks: http://chgopdfan.tripod.com/id12.html (There are some star number changes that don't seem to be explained by promotion/demotion, too.) The May data csv has fields for up to 10 star numbers per individual. An individual's star number can change over time, and eventually each star number is reissued.

I'm fairly certain you could create a unique ID by combining the individual's first star number and date of appointment. The chance of a duplicate seems very unlikely, and then you'd have something linked to this and future CPD data instead of a number assigned by us.

hi all, I'm going to ask @rajivsinclair to chime in later, with a more comprehensive answer, but for now:

  • yes! star numbers are recycled (this is actually ahold over from the days when manufacturing badges was expensive!)
  • date of appointment is not always consistent (re-entering sometimes)

the unique identifier we've been assigning follows this format:

first name - last name - middle initial - birth year - date of appointment - race - gender

examples:
JAMES-BANSLEY-A-1983-2009-12-16-WHITE-M
JOEL-BENTLEY-A-1976-1999-10-25-WHITE-M
KEVIN-CONNORS-M-1975-1999-09-13-WHITE-M

@chaclyn @DGalt @banoonoo2 @rajivsinclair I think we can use the unique ids that CPDB database are using. CPDB database is usually updated with all the data, so, we think that most of the data in sworn officer are already there. What do you guys think about it?

DGalt commented

If there is already a system in place within the CPDB database then yeah I'd think it'd be best to just use that. Is this what you're referring to above @chaclyn?