/alpha-insurance

Looking for factors indicating fraud using insurance claims data.

Primary LanguageJupyter Notebook

Alpha-Insurance-Fraud-Detection

You have been hired by Alpha Insurance to develop predictive models to determine which automobile claims are fraudulent. You have been given data on approximately 5000 auto claims which includes a variable indicating whether the company believes the claim is fraudulent or not.

Author:

  • Robert Shea

Bryant University ~ Fall 2018

Hypothesis

These variables appear to be the best for detecting fraudulent claims:

  • Claim Amount - Uncommonly high claim amounts are more likely to be fraudulent.
  • Claim Cause - The more severe claim causes (fire and collision) will be less likely to be fraudulent.
  • Claim Report Type - Fraud claims will be reported with as little human interaction as possible.
  • Employment Status - Claimants who are not currently employed are more likely to report fraudulent claims.
  • Income - The higher the level of education, the less likely reports are to be fraudulent. (This may also be linked with income)

Process

Data Exploration

  • Univariate exploration
  • Bivariate exploration

Transformations

  • Impute missing values
  • Handle outliers
  • Transform variables with functions
  • Transform variables with binning
  • Encoding
  • Balancing Sample

Modeling

  • Regression
  • Decision Tree
  • Neural Network
  • Other
  • Model Selection

Sources