/election-predictions

Predicting election results by county for the 2016 US general election with machine learning in R.

MIT LicenseMIT

election-predictions

alt text

Index

  1. Summary
  2. File Directory
  3. Language and Packages Used
  4. Credits
  5. License

Summary

The following project accomplishes two goals:

  1. Predicting the 2016 US election results by county with supervised machine learning in R.
  2. Mining interesting association rules that relate to demographics and voting preference in R.

Three supervised machine learning models are used to predict election results based on demographics: K-Nearest Neighbor, Decision Trees, and Artificial Neural Networks. The models are compared based on accuracy and precision.

File Directory

  1. data - contains three data sets used in analysis (taken from kaggle, referenced in the credits):
             a. county_facts.csv - Demographic breakdown of each county.
             b. county_facts_dictionary.csv - Dictionary to decode variable names in county_facts.csv.
             c. pres16results.csv - Results of the 2016 election by county.

  2. images - contains vizualizations:
             a. decision_tree.png - Decision tree created from modelling process.
             b. model_comparison.png - Comparison of 3 classification models used.
             c. population_trends.png - Population size by voting preference.
             d. voting_trends.png - Voting trends by top 5 normalized demographics.
             e. democrat_arules.png - Scatterplot of democratic association rules by support and confidence.
             f. republican_arules.png - Scatterplot of republican association rules by support and confidence.
             g. democrats_grid.png - Color grid of democratic association rules.
             h. republican_grid.png - Color grid of republican association rules.

  3. classification - contains classification files that predict election outcome based off demographics:
             a. classification.Rmd - R Markdown detailing the classification process, from data cleaning to model creation.
             b. classification.pdf - PDF that shows R code and the outputted results, for easy viewing.

  4. association_rules - contains association rules files:
             a. association_rules.Rmd - R Markdown to mine rules that relate to demographics and voting preference.
             b. association_rules.pdf - PDF that shows R code and the outputted results, for easy viewing.

  5. results.pdf - A full write-up comparing classification and association rules mining in R vs SAS.

Language and Packages Used

R is used for all model building - the results are compared in R vs SAS.

The following packages are used:

#list of packages used
packages <- c("dplyr", "tidyr", "ggplot2", "class", "rpart", "rpart.plot", "neuralnet", "arules",
            "plyr", "mltools", "arulesViz", "plotly", "RCurl")

#check to see if package is already installed, if not, install
for(p in packages){
if(!require(p, character.only = TRUE)) {
  install.packages(p)
  library(p, character.only = TRUE)
} 
}

Credits

  1. Would like to thank Ben Hammer for the county_facts.csv and county_facts_dictionary.csv datasets, which were taken off Kaggle.
  2. Would like to thank Steve Palley for the pres16results.csv dataset, which was taken off Kaggle.

License

MIT License Copyright (c) 2019 Ian Jeffries