EPA Data Modeling

Predicting Air Quality with Geospatial Analysis

Researched and executed as my capstone for General Assembly's 12-week Data Science Immersive, this report uses geospatial and time-series analysis to find predictive signal between two disparate sets of governmental data.

Part 1: I do EDA and perform time-series analysis of Ozone emissions data to create a target measure of air quality to predict onto in Part 2. As part of this EDA, I also create several visualizations to help make sense of this data, interactive ones (via the visualization package Folium) are viewable and clickable here.

Link to Part 1 on Github

Part 2: Using the targets created in part 1, I perform a series of regression and classification techniques to explore the features present in the EPA Toxic Release Inventory dataset (via EPA) and to examine the predictive power of various industries and specific chemicals on air quality.

Link to Part 2 on github

NOTE to download and run this locally, you'll need to acquire the datasets directly from the EPA. Directions to do so are clearly established at the beginning of part 1.

They are larger than github will allow.