Predicting eviction risk in Chicago

This is our final project for the UChicago class CAPP 30254, Machine Learning for Public Policy. For this project, we use machine learning to predict the block groups in Chicago where eviction risk is highest (in the top 10%) in the next 3 years.

Team Members

Nora Hajjar
Lilian Huang
Peter Li
Kyle Schindl

Requirements

This is the list of Python libraries that should be installed to run our code:

numpy
pandas
sodapy
datetime
census
functools
geopandas
shapely
statsmodels
matplotlib
seaborn
csv
scikit-learn
aequitas

Files Containing Raw Data

These are all in the raw_data directory

block-groups.csv, the original dataset on evictions which is manually downloaded from the Eviction Lab
cb_2017_17_bg_500k, a block groups shapefile which is manually downloaded from the Census Bureau
HOLC_Chicago, a redlining shapefile which is manually downloaded from the University of Richmond's Mapping Inequality project
chicago_blocks.csv, a dataset containing all census blocks in Chicago which is manually downloaded from the Open Data Portal of the City of Chicago
Crimes_-_2001_to_present.csv, which is manually downloaded from the Open Data Portal of the City of Chicago

We also made use of American Community Survey estimates, but these were accessed through an API (in Notebook_cleaning.ipynb) rather than being downloaded manually.

Scripts and files containing code

These are all in the code directory.

In this notebook we clean, update, and merge various sources of data:
- Notebook_cleaning.ipynb
These files are used to run the machine learning pipeline/models:
- go.py
- final_pipeline_for_vm.py
These notebooks are used for model evaluation and comparison:
- Evaluation.ipynb
- Model Comparisons.ipynb
This notebook is used to create descriptive plots based on the data:
- Plots.ipynb
This notebook uses Aequitas to analyze bias in the models:
- Aequitas.ipynb

Intermediate/output files produced through our processing

These are all in the output_files directory. These are intermediate or output files that will be generated by running our code, but copies are included here for reference as well.