/Predicting-Insurance-Fraud

This repo contains Machine Learning and Data mining R code for a prediction framework to identify fraudulent claims for a major general insurance organization.

Primary LanguageJupyter Notebook

Predicting-Insurance-Fraud

This repo contains Machine Learning and Data mining R code for a prediction framework to identify fraudulent claims for a major general insurance organization. The model acurately predicted fraud in insurance claims. Results were obtained over multiple iterations of the entire data-science pipeline: data pre-processing, feature engineering, model selection, hyper-parameter tuning and performance analysis on validation and test data.

Project Objectives:

  • To build a sophisticated prediction model using machine learning that can classify fraudulent insurance claims using the ‘F1 statistic' as an error metric for the model accuracy
  • To specifically use the decision tree algorithm to analyse and extract the top 20 significant patterns in fraudulent claims.

Folders in the repo:

  • Datasets: Containing train and test data
  • Code: Containing code files for different parts of the model pipeline: Data preprocessing, EDA, Visualizations, Model Building and Analysis