Predicting-Insurance-Fraud

This repo contains Machine Learning and Data mining R code for a prediction framework to identify fraudulent claims for a major general insurance organization. The model acurately predicted fraud in insurance claims. Results were obtained over multiple iterations of the entire data-science pipeline: data pre-processing, feature engineering, model selection, hyper-parameter tuning and performance analysis on validation and test data.

Project Objectives:

To build a sophisticated prediction model using machine learning that can classify fraudulent insurance claims using the ‘F1 statistic' as an error metric for the model accuracy
To specifically use the decision tree algorithm to analyse and extract the top 20 significant patterns in fraudulent claims.

Folders in the repo:

Datasets: Containing train and test data
Code: Containing code files for different parts of the model pipeline: Data preprocessing, EDA, Visualizations, Model Building and Analysis

rohilrao/Predicting-Insurance-Fraud

Predicting-Insurance-Fraud