/Machine_Learning_Predict_Violators

Using machine learning to predict High Priority Violators of the Clean Air Act

Primary LanguageJupyter Notebook

ML_Project

ICIS-Air contains emissions, compliance, and enforcement data on stationary sources of air pollution. Regulated sources cover a wide spectrum; from large industrial facilities to relatively small operations such as dry cleaners (automobiles and other mobile air pollution sources are tracked by a different EPA system).

We will utilize the ICIS-Air Dataset to predict Clean Air Act violations.

Inside the ICIS_explore2 file you can find some very basic graphs that will give you a sense of the distribution of our data.

In the ICIS-Output&Features.ipynb or ICIS-Output_Features.py file you can run a version of our 'magic loop' that allows you to view the output and evaluation metrics of various machine learning models over different time intervals between January 1, 2000 and January 1, 2017that we experimented with before choosing a model to best address our selected policy problem- helping the EPA improve their Clean Air Act inspection schedule in an effor to prevent further violations. In order to run the functions in this file, you must visit this link: https://echo.epa.gov/tools/data-downloads and download the ICIS-AIR_downloads zip file, then place that folder inside this directory.