- The aim of this project was to analyse car accidents that have occurred in the UK in the period 2005-2015 to give insight into the question: "How can the UK government use its collected data to improve road safety?"
- This analysis has been conducted in a group format (Dataholics) for Practical Business Analytics, a module provided by the University of Surrey. The project was conducted within the span of 3 weeks.
-
Predicting_Accidents_Report.pdf
- The comprehensive report produced by the analysis made via the following R-scripts.
-
Functions_Dataholics.R
- Contains mainly own written functions. Also contains some of the functions written by Prof. Nick F Ryman-Tubb, presented to us in the computer labs.
-
main.R
- Executes the program. Has each needed filename as a global variable. Reads in the functions file, installs the needed packages and executes the different parts of the program. You will be asked to enter your JAVA_HOME path. On Windows there was a slight issue regarding the file path. Changing the path to “C:/path_to/Java/jre_version” fixes the issue should it occur.
-
Cleaning.R
- This file reads in the original data files, drops not needed columns and drops entries with NULL values or certain, pre discussed values. Delivers a bar graph for each parameter and compares the distribution of original and cleaned data set’s Accident_Severity. (Steps are described inside the report.)
-
Combining_Aggregating.R
- Changes Junction_Control value based on the value of Junction_Detail. Combines some values of certain columns. Will also deliver a comparison of the Accident_Severity between the original and the cleaned data sets. (Steps are described inside the report.)
-
Prediction_Parameter.R
- You will be asked if you want to execute your own clustering or if you just want to execute the hard-coded clustering we prepared. By typing “y” and pressing enter, you will execute your own k-means cluster calculation for 2 to 15 clusters, 3 times, to plot the variances for the different cluster numbers. After that a k-means cluster will be performed on the local minimum of 8 clusters with 2500 starting points. This will produce a similar cluster to the one we decided for and hard-coded. That will be executed afterwards.
-
TimeAnalysis_Hyp2.R
- Executes the analysis on the time data (Hypothesis 2 of the report)
-
AgeAnalysis_Hyp3.R
- To run k-means clustering for different values of k which increases run time significantly (total runtime ~10 mins) first ensure the global variable “Multi_k_Plot” is set to TRUE. Then type “y” when prompted. To disable this set “Multi_k_plot” to FALSE please press ENTER when prompted.
-
EnvironmentAnalysis_Hyp1_Regression.R
- Executes a multivariate linear regression, a mars regression and a mars regression with cross validation of both the Accident_Severity and the Custom_Severity, created in Prediction_Parameter.R