This repo contains Machine Learning and Data mining R code for a prediction framework to identify fraudulent claims for a major general insurance organization. The model acurately predicted fraud in insurance claims. Results were obtained over multiple iterations of the entire data-science pipeline: data pre-processing, feature engineering, model selection, hyper-parameter tuning and performance analysis on validation and test data.
Project Objectives:
- To build a sophisticated prediction model using machine learning that can classify fraudulent insurance claims using the ‘F1 statistic' as an error metric for the model accuracy
- To specifically use the decision tree algorithm to analyse and extract the top 20 significant patterns in fraudulent claims.
Folders in the repo:
- Datasets: Containing train and test data
- Code: Containing code files for different parts of the model pipeline: Data preprocessing, EDA, Visualizations, Model Building and Analysis