invinst/chicago-police-data

How did we aggregate the different data files that make up each FOIA response?

Closed this issue · 7 comments

Questions for @ithinkidunno:

When we say "the April data" (for example), are we referring to the result of appending all these files in shootings-data/Raw/FOIA_April2016/ together?

218 Resp SS_2012.xls
218 Resp SS_2013.xls
218 Resp SS_2014.xls
218 Resp SS_2015.xls
218 Resp SS_2016.xls

Is the outcome of appending all these files stored anywhere in this repo?

And is that what Compare_FOIA.do is doing?

What is the difference between the FOIA_April2016 folder and the raw dump April2016 folder?

What is the difference between 218 Resp SS_2012.xls and All complaints during 2012.xls?

Would we lose anything by reformatting these smaller Excel files as CSV? Easier to open and work with.

There’s no issue with exporting the data from Excel files to CSV. That’s a good idea. Even for the more complex files with multiple sheets, we should probably just break them up into separate files (with one CSV file per sheet in Excel) and save them as a folder. I still think it’s best to keep all of the original documents available in the repo, in case someone else needs them for whatever reason.

Re: the FOIA_April2016 folder and the raw dump April2016 folder

I think that’s a merge mistake where a move command turned into a copy action
you should feel free to eliminate the least useful version.

I’m assuming that they’re both the exact same file size – right?

DGalt commented

Bocar mentioned last night that all of the FOIA_*X* were the same as the raw_dump_*X*, plus some of his STATA analysis files. He said that we should toss the FOIA_*X* folders and just keep the raw_dump_*X* folders

I confirm it's the same...

Closed via #20. Thanks for the helpful answers @rajivsinclair and @ithinkidunno!