This project takes wildfire data from the United States Geologic Survey, creates a smoke impact estimate for the city of Corpus Christi, Texas, and compares it to data collected from air quality monitoring stations from the United States Environmental Protection Agency (EPA). A final summary of the research can be found in the pdf CB_DATA512_Final_Project_Writeup located in this repository.
Note that the wildfire geoJSON data is too large to be stored in github, so if you want to replicate this analysis you will have to download it from the link provided below.
Note that all data produced is stored in an intermediate file location for use in other parts of the project. The purpose of the project was not to produce research grade data sources, so the data is not stored as such.
load_data_notebook.ipynb This notebook reads the USGS wildfire geoJSON data and calcuates the distance from each fire to Corpus Christi.
retrieve_daily_aqi_data.ipynb This notebook retrieves the particulate matter air quality data from monitoring stations in the area surrounding Corpus Christi on the days during fire season.
calc_yearly_air_quality.ipynb Aggregate particulate matter data collected from the EPQ API.
smoke_estimate.ipynb This notebook cleans the wildfire data and generates the smoke estimate for each wildfire, analyzes the results, and generates a prediction for smoke impact in Corpus Christi for the next 25 years. The correlation between the aqi data and the smoke impact is also analyzed to understand the accuracy of the index.
heart_disease_data_cleaning.ipynb This notebook reads in the CDC heart disease mortality data, visualizes it for different population groups, and generates a csv of mortality rates for people over the age of 65 in Nueces County were Corpus Christi is located.
correlation_analysis.ipynb This notebook contains correlation analysis between the smoke index and the mortality rates of people 65 and older in Nueces County (where Corpus Christi is located).
Wildfire Data Wildfire data was sourced from USGS and can be found here.
Data citation: Welty, J.L., and Jeffries, M.I., 2021, Combined wildland fire datasets for the United States and certain territories, 1800s-Present: U.S. Geological Survey data release, https://doi.org/10.5066/P9ZXGFY3.
Air Quality Data Air Quality Index data was collected from the Air Quality System API maintained by the EPA. Data was collected within the terms of the terms of service limiting the size of queries and limiting the frequency of queries with a 10 second sleep between requests. The documentation for the API can be found here
Code for reading the the USGS GeoJSON and accessing the EPA AQI API that was written by Professor David MacDonald at the University of Washington was used and noted in notedbooks where used.
Research sources are noted in the final write which can also be found in this repository.
full_fire_w_distance.csv This data has information from the USGS historical wildfire data. The data includes information about the fire including how far it was from Corpus Christi in miles. This is used to generate the smoke impact index.
aqi_data_daily_raw.csv Daily particulate matter data from monitoring stations in a box centered on Corpus Christi with sides of length 50 miles on the days during fire season (May 1 - Oct 31) for the years these stations are active (2000-2021).
annual_avg_particulate_matter.csv Average annual PM2.5 level per year from all recording stations in area surrounding Corpus Christi from 2000-20221
sixtyfive_overall_mortality.csv This data contains the mortality rate for each year for people over the age of 65 in Nueces County from 1999 to 2019.
annual_smoke_impact.csv The annual estimated smoke impact on Corpus Christi from fires within 650 miles and after 1960. The year, raw smoke imact score, and scaled impact score are included.