We developed a Machine Learning pipeline to predict accident severity and a ChatBot that provides bite-sized advice to drivers!
Winners at at HackUPC 2019 for McKinsey & Company's challenge in Barcelona
- Slides: Insighters Hack the Crash at HackUPC 2019
- YouTube: Video
The submission for the predictions is here
Rate of road-accidents and fatality in the UK is very high
This challenges focuses on predicting accident severity based on the following parameters:
- Road location
- Road type and its speed limit
- Weather conditions
- Details of vehicles involved in the accident
- and many more!
Predicting accident severity will help the authorities closely monitor accident prone road segments and do a root cause analysis for such mishaps
Accident training data | Vehicle data |
---|---|
137,599 | 455,652 |
- Exploratory Data Analysis: EDA by looking at all the fields and their values
- Data Wrangling:
- Data Summarization:
- Correlations & Scatter plots
- Data Distributions
- Feature Engineering: handcraft a bunch of features from the fields
- Feature Selection & Dimensionality Reduction: PCA, tSNE, UMAP
- Split the data: split the dataset into Train and Validation sets
- Modelling a classification model:
- Model a Random Forest classifier and analyse the power of their features
- Transform the text of the information regarding the accident and the vehicles involved using NLP techniques such as TF-IDF & Embeddings
- Ensembling our individual models
- Ablation studies: fitting a Linear Regression model and running ablation studies to measure the variance explained by the features
- Mutual Information Exploration: measuring the mutual information (mutual dependence between two variables) of each of the features we figured as important (based on the ablation study) and the precision mark.
- Hyper-parameter tuning: tune the model
- Train the final model: train the final model and build the submission file
You can always view a notebook using https://nbviewer.jupyter.org/
- Python 3
- Docker
- keras
- tensorflow
- numpy
- pandas
- matplotlib
- scikit-learn
- pillow
- jupyter
- Telegram
- Python Telegram Bot
EDA: Exploring the target variable:
EDA: Daily Accidents in January:
Scatter: Vehicles vs Casualties: