ML-Projects
Machine Learning Projects
This Project was part of Georgia State University - Robinson College of Business Data Programing Course.
About Project:
To Predicts the probability that a driver will initiate an auto insurance claim in the next year.
Most companies charge a flat premium to the customers irrespective of their risk for filing an insurance claim. Inaccuracies in car insurance company’s claim predictions raise the cost of insurance for good drivers and reduce the price for bad ones. Our project will help the insurance company in following ways:
Affluent customer: Company can attract the good drivers if it is doing the fair pricing. Loss ratio: Company can avoid specific customer/policies if they are at high risk of filing claim which in turn decrease loss ratio. Fair pricing: Company can charge the premium to the customers by their risk, and accurate prediction will allow them to tailor their prices further. Claim forecast: Claim is proportional to the number of risky customers, so company forecast the number of claims it could get next year which will help them to manage their fund better. Data Source:
Data Source for the project is available on Kaggle competition Porto Seguro’s Safe Driver Prediction (train.csv) https://www.kaggle.com/c/porto-seguro-safe-driver-prediction/data
Python Notebook:
Insurance Claim Data Exploration - Contains the initial exploration of data like finding the distribution of target variables, missing values, correlated variables, distribution of categorical variables with target variable, etc.
Insurance Claim ML Model - Contains the code for handling missing data in an interval and continuous variables, imputation of missing data for categorical variables, one-hot encoding/dummification of the categorical variables, outlier treatment, feature scaling and various feature engineering methods to create the final machine learning models.
Acknowledgment:
Entire Kaggle community https://www.kaggle.com/bertcarremans/data-preparation-exploration https://www.kaggle.com/anokas/simple-xgboost-btb-0-27 https://www.kaggle.com/rafjaa/resampling-strategies-for-imbalanced-datasets https://www.analyticsvidhya.com/