Collaborators: Sarah Liang, George Weale, Ryan Lee, Ryan Yee
The goal of this group project is to provide insights to potential political campaign moves for U.S. swing states supported by big data analysis of voter turnout in Georgia.
The following listed files include code and model building steps. The entire project is neatly reported and presented here.
We developed a logistic regression model using GCP and PySpark to analyze voter turnout data, achieving an AUC of 0.612. From this information, we recommended targeting 18-35 and low-income demographics, according to model predictors, to increase voter turnout. Potential future work, can include refining the model, exploring interaction terms, and adding to campaign strategies.
Pandas | statsmodels | matplotlib | NumPy |
PySpark | seaborn | sklearn |
Written report with abstract, project objectives, exploratory/preliminary analysis, further methodology and analysis, results, and conclusion.
This is a Jupyter notebook with code cells and commentary cells corresponding to the plots in the final pdf report.
Python script file with code for all plots conducted (report plots plus some).
Python script file with code for model building and updating using PySpark SQL and PySpark ML features.