/hackthecrash

🚗 Insighters Hack the Crash! at HackUPC 2019: Predicting accident severity

Primary LanguageJupyter Notebook

Insighters Hack the Crash! at HackUPC 2019 for McKinsey & Company

We developed a Machine Learning pipeline to predict accident severity and a ChatBot that provides bite-sized advice to drivers!

Winners at at HackUPC 2019 for McKinsey & Company's challenge in Barcelona

The submission for the predictions is here

Motivation

Rate of road-accidents and fatality in the UK is very high

This challenges focuses on predicting accident severity based on the following parameters:

  • Road location
  • Road type and its speed limit
  • Weather conditions
  • Details of vehicles involved in the accident
  • and many more!

Predicting accident severity will help the authorities closely monitor accident prone road segments and do a root cause analysis for such mishaps

Data

Accident training data Vehicle data
137,599 455,652

Notebooks

  1. Exploratory Data Analysis: EDA by looking at all the fields and their values
  2. Data Wrangling:
  3. Data Summarization:
  4. Feature Engineering: handcraft a bunch of features from the fields
  5. Feature Selection & Dimensionality Reduction: PCA, tSNE, UMAP
  6. Split the data: split the dataset into Train and Validation sets
  7. Modelling a classification model:
    • Model a Random Forest classifier and analyse the power of their features
    • Transform the text of the information regarding the accident and the vehicles involved using NLP techniques such as TF-IDF & Embeddings
    • Ensembling our individual models
  8. Ablation studies: fitting a Linear Regression model and running ablation studies to measure the variance explained by the features
  9. Mutual Information Exploration: measuring the mutual information (mutual dependence between two variables) of each of the features we figured as important (based on the ablation study) and the precision mark.
  10. Hyper-parameter tuning: tune the model
  11. Train the final model: train the final model and build the submission file

You can always view a notebook using https://nbviewer.jupyter.org/

Technologies

Figures

EDA: Exploring the target variable:

EDA: Daily Accidents in January:

Scatter: Vehicles vs Casualties:

Random Forest: Forest Importances:

ChatBot: Bite-size driving advice: