/thecounted

Copy of the data from The Counted, a Guardian project to count people killed by US law enforcement agencies in 2015 and 2016

Primary LanguagePython

What's The Counted?

To quote from the Guardian:

The Counted is a project by the Guardian --- and you --- working to count the number of people killed by police and other law enforcement agencies in the United States throughout 2015 and 2016, to monitor their demographics and to tell the stories of how they died.

And what's this?

The Guardian makes the data behind the project available as a ZIP file and they keep that file up-to-date, but they don't give any indication when that file changes nor what's changed when it does.

Fortunately, the ZIP file's contents are a README and two CSV files (data for 2015 and data for 2016), which are well-suited to being stored in Git, a source control system. And since the zipped data is available on the Web it's also easy to check that regularly to see if it's changed.

That's where this comes in: the ZIP file is checked every twenty minutes for changes, and if there's anything new it's committed to this repository on GitHub. By keeping track of this repo you can ensure you have the latest version of the data behind The Counted.

Repository contents

Data extracted from the source ZIP file is kept in the data directory on the master branch. No alterations are made to the files themselves and all the hard work is done by the Guardian's staff.

Everything outside of the data directory is not part of the source data and is only there to support keeping it in this repo.

How the repo is updated

Every twenty minutes a Python script is run using Cron. The script checks to see if the data has been updated, and commits any files that have changed.

The script is kept within this repository as scripts/update_repo.py. To run it you need:

The requirements are available in requirements.txt and can be installed with pip. To receive Pushover notifications you'll need a config in ~/.pushoverrc, but it will fail silently if you don't.

Keeping track of the data

The nerdiest way is to clone the repository and pull regularly, but if you're not of the nerd persuasion then you have a few other options:

  • If you have an account on GitHub you can watch the repository. Changes to the repo will then appear on your dashboard when you're logged in
  • If you don't have an account on GitHub you can bookmark the commits page. New messages there mean new updates to the repo
  • You can subscribe to the Atom (like RSS) feed. Any updates to the repo will then appear in your feed reader of choice

History of changes in 2015

While there are now two CSV data files, one for 2015 and one for 2016, there was originally only one file in the Guardian's ZIP file, data/the-counted.csv. On 4 February 2016 the file was renamed to data/the-counted-2015.csv. Constraints in the Git version control software means the full commit history isn't available for the new file, but you can see the deleted file's history, until 3 February 2016, on Github. If you're a command-line aficionado you can clone the repo and use git log --follow -- data/the-counted.csv.

Support