/Labs25-Human_Rights_First-TeamC-DS

Team C Data Science Repo for the Human Rights First Police Use of Force Tracker

Primary LanguageJupyter NotebookMIT LicenseMIT

Human Rights Considered - Data Science Backend

Human Rights First is a 501(c)(3) international independent advocacy and action organization that challenges America to live up to its ideals. We believe American leadership is essential in the global struggle for human rights, so we press the U.S. government and private companies to respect human rights and the rule of law. When they fail, we step in to demand reform, accountability and justice. Around the world, we work where we can best harness American influence to secure core freedoms.

Human Rights Considered is a project working to track incidents of police use of force on Americans for Human Rights First. Our initial goal was to develop a visualization that showcases instances of police use of force along with a data science model that helps classify possible instances of brutality. We quickly realized that our highest-priority data science task -in addition to creating a model to assess use of force- was to source and process the relevant data, create a database, and to host it in an accessible API.

Disclaimer: This application is currently in Alpha (as of Sep 20, 2020) and is not ready for production. Please use at your own risk.

DS Contributors

Axel Corro Michelle Hottinger Miriam Ali

This project's front end repository can be found here.

MIT Python Docker code style: prettier

Tech Stack

Python Packages

  • Pandas
  • Snorkel
  • GeoPy
  • NLTK
  • Scikit-learn
  • Psycopg2

DevOps

  • Docker
  • PostgreSQL
  • SQLAlchemy
  • AWS CloudWatch
  • AWS Lambda
  • AWS Elastic Beanstalk
  • FastAPI

Overview

Data

Currently we are using data from Police Brutality 2020, which primarily sources data from Reddit posts. This data as of August 2020 was used to train our model and seed our database. New incidents and evidence from PB2020 will be also added to the database via a cron job executed by AWS Lambda. One of our goals for future releases is to include more dynamic social media scraping, like Twitter.

Processing and Model

Incident data was cleaned, and location metadata was added to each incident with a geocoder. In order to create a model which predicts which type of force was deployed, we first created a training dataset using a new method of weakly supervised learning with Snorkel.

For more information on our data cleaning process, how we used Snorkel, and our model, see our machine learning readme.

Database Schema

For information, see our database readme.

API Endpoints

Endpoints

Route: /incidents

Method: GET
Description:

Read all incidents of police use of force. Incidents can be identified by their unique id, eg: ca-sanfrancisco-1.

Schema:
[
  {
    "id": "string",
    "place_id": 0,
    "descr": "string",
    "date": "string",
    "evidences": [
      {
        "incident_id": "string",
        "link": "string",
        "id": 0
      }
    ],
    "tags": [
      {
        "incident_id": "string",
        "tag": "string",
        "id": 0
      }
    ],
    "place": {
      "city": "string",
      "state_name": "string",
      "state_code": "string",
      "latitude": "string",
      "longitude": "string",
      "id": 0
    }
  }
]

Route: /incidents/{tag}

Method: GET
Description:

Read incidents by tag. For example: /incidents/projectiles

Sortable Tags:

  • Blunt Impact
  • Chemical
  • EHC Soft Technique
  • EHC Hard Technique
  • Projectiles

Schema:

see /incidents endpoint above

Route: /cron_update

Method: POST
Description:

Endpoint for the cron job which updates the database with new incidents and evidence from PB2020.

Schema:
WIP

See the cron readme.