This repo contains work performed by Jackie Nichols, Hao Wu, Song Park, and Robert Link for our capstone project in W210 at UC Berkeley.
There have been concerns surrounding sentencing challenges within the criminal justice system that have been brought to light recently as a result of greater access to data. For example, some recent research has shown that African Americans face harsher treatment than whites in the criminal justice system. Minority groups in general tend to face harsher sentencing including longer prison sentences. Equity in the criminal justice system is in question and there is data that help provide greater transparency around the criminal justice system. Drawing inspiration from the following
- the award-winning "Bias on the Bench" 2016 series from the Sarasota Herald-Tribune, (http://projects.heraldtribune.com/bias/)
- The story of Walter McMillian, who, with the help of Bryan Stevenson a defense attorney, appealed his murder conviction as depicted in the movie Just Mercy
- The team at American Equity and Justice Group whose goal is to provide transparency into the WA criminal justice system
the DAATE team apply data science techniques in a 12-week MVP to data from the Florida Department of Corrections (2004-2016) to investigate disparity in sentencing.
Defense Attorney Advisory Tool for Equity (DAATE) is motivated by the desire to explore the inequity in the US criminal justice system. A desire that was realized after becoming aware of research that has shown that approximately five times more Black Americans are being incarcerated than that of White Americans and that Black Americans face harsher treatment than White Americans in the US criminal justice system; with as much as 19.1% longer sentences than White offenders.
The goal of DAATE is to provide greater transparency into sentencing of Black and White Americans in the United States (US) criminal justice system. Leveraging the Florida Department of Corrections data from 2004-2016, DAATE is intended to be an advisory tool for defense attorneys to gain additional insight about potential inequity in the Florida legal system through simple yet impactful analytics. In addition to analytics, DAATE provides statistical guidelines for evidence of sentencing bias through exploring causality, as well as providing easily interpretable, statistical model-based sentence time predictions. This is accomplished by identifying potential bias in sentences for particular crimes based on a defendants race using various data science techniques.
DAATE is intended to be an advisory tool for defense attorneys to gain additional insight about potential inequity in sentencing, and to aid in the attorney's case preparation to represent and to seek fair and equal treatment for their client.
To empower legal professionals to realize fairness and equity for their clients by providing transparency into sentencing in the US criminal justice system using data science techniques.
The DAATE MVP leverages Microsoft Azure Cloud infrastructure to host the synchronous pipeline and data ingestion and processing. The Florida Department of Corrections (DOC) data from 2004-2016 was used and stored in this infrastructure. This backend is used for processing the DOC data through various modelling techniques and is built for high scalability and modularity. The data and modelling results are accessed by Tableau and served on this site via GitHub pages. The pipeline and process consists of multiple stages:
- Ingest the Florida Department of Corrections (DOC) data from 2004-2016 to Azure Blob Storage
- Leverage Databricks to move to Azure SQL DB table
- Elastic search, Azure ML and Python to perform EDA and create MVP Azure SQL table
- Leverage Bias and Disparity Detection Engine API via Docker container updating MPV Azure SQL table
- Perform multi-modelling in Azure ML and Python updating the MPV Azure SQL table
- Tableau is used to access MVP Azure SQL table to create dashboard
- GitHub pages are used to serve up the DAATE website
For a look at our MVP, please visit our website at https://mspuckit.github.io/DAATE/ and our Try It section that has 4 dashboards with our MVP results around sentencing disparity:
- Explore Florida Department of Corrections Data: Contains analytics as well as Causal and Predictive Model Results
- Florida Sentencing Model Results: Use this dashboard to view overall causal and predictive model results for a subset of Florida circuits
- Florida Sentencing Details by Judge: Use this dashboard to select a judge to see the corresponding sentencing data associated to them
- Florida Sentencing Bias & Disparity Detection Engine Results: See the results from leveraging the Bias Detection Engine on Miami data for comparison to DAATE results.
Below is a list of directories found in this repository along with a brief description.
Directory | Description |
---|---|
MODELS |
Contains all of the notebooks for the various models |
DELIVERABLES |
Contains the final deliverables for this project |
EDA & DATA CLEANSING |
Contains notebooks for EDA & data cleansing |
ASSETS |
Website specific files |
Below is a list of notable files found in this repository along with a brief description.
File | Description |
---|---|
WEB SITE | |
index.html |
Home page for DAATE |
acknowledgment.html |
Acknowledgement page for DAATE. |
arch-details.html |
Architecture details page for DAATE. |
data-details.html |
Data details page for DAATE. |
model-details.html |
Model details page for DAATE. |
results-details.html |
Results details page for DAATE. |
references.html |
List of references used during the creation of DAATE. |
whatsnext-details.html |
Next Steps details page for DAATE. |
terms.html |
Terms of Use page for DAATE. |
privacy.html |
Privacy Policy for DAATE. |
tryit_bdde.html |
BDDE Dashboard results for DAATE. |
tryit_caselist.html |
Judge case list for DAATE. |
tryit_modelresults.html |
Model results for DAATE. |
tryit_one.html |
Analytic results for DAATE. |
MODEL ANALYSIS | |
Notebooks/ModelAnalysis.ipynb |
Analysis of dataset |
Notebooks/aligning_and_balancing_multiple_datasets |
Code file to balance datasets. |
Notebooks/Causal_finalize.ipynb |
Causal Inference result generation |
Notebooks/regression_circuitsXcrimes.ipynb |
Predictions Circuit Level |
Notebooks/regression_withJudge.ipynb |
Predictions Judge Level |
DELIVERABLES | |
deliverables/DAATE_Pres1.PDF |
PDF of presentation 1. |
deliverables/DAATE_Pres2.PDF |
PDF of presentation 2. |
deliverables/DAATE_Pres3.PDF |
PDF of presentation 3. |
EDA & DATA CLEANSING | |
Data_Cleaning_Outliers_ZScore.ipynb |
Data cleansing process & removing outliers |
Initial_EDA_sentencing.ipynb |
Data Exploration & Analysis |
JudgeCleanups.ipynb |
Messy Judge Name Cleanups |
REFERENCES | |
References.pdf |
A list of references used throughout the DAATE MVP journey |