This project consists in a statistical analysis of a large traffic accidents dataset [1, 2] using Spark.
It has been developed using Google Colab's environment. For this purpose, both Jupyter Notebook and Dataset had been hosted using Google Drive.
You can run the code in my hosted notebook or upload the code and setup your own working environment following the next section steps.
Set the environment performing the following steps:
- Create the
Colab Notebooks
andColab Datasets
folders in your Google Drive space. - Import the
USAccidents.ipynb
Jupyter Notebook into yourColab Notebooks
folder. - Download the USAccidents dataset and import it into your
Colab Datasets
folder. - Open the Jupyter Notebook in Google Colab.
GitHub's Jupyter Notebook renderer does not display the plots generated by plot.ly.
[1] Moosavi, Sobhan, Mohammad Hossein Samavatian, Srinivasan Parthasarathy, and Rajiv Ramnath. “A Countrywide Traffic Accident Dataset.”, arXiv preprint arXiv:1906.05409 (2019).
[2] Moosavi, Sobhan, Mohammad Hossein Samavatian, Srinivasan Parthasarathy, Radu Teodorescu, and Rajiv Ramnath. “Accident Risk Prediction based on Heterogeneous Sparse Data: New Dataset and Insights.” In proceedings of the 27th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, ACM, 2019.