Challenge: Catch the Fraudster

Claire Smith InstructorProjects and Portfolio Ideas · 21 dias atrás This challenge has been provided by Sunil Kappal

Challenge deadline is July 14th 2017

Submit an entry for this challenge

If you have any questions or feedback about this challenge, please post them below.

General Challenge FAQ

Overview

Identify the fraud propensity for a retail company based on a 4K rows worth of data with 1 target variable (Fraud Instance) and 11 predictor variables

Basic Information

Length – 1-2 week
Group Size – 1-2 individuals
Difficulty - Beginner
Prerequisite Knowledge – Understanding of basic statistics, data types and usage of data mining tool is required.
Required/Recommended Technology – Use R based GUI tools to perform this task (Recommended GUIs: Rattle, Rcmdr, Deducer). Being a very small dataset even excel can be really helpful

Background

Background/Context – Every retail chain faces a potential fraud instances where people order a product and then return it after some days claiming either the product doesn’t work or doesn’t provide desired utility. However, each such transaction has some precursors that may point towards a potential fraud instances. Target Audience – Retail Industry’s Risk Management Team will highly benefit from this analysis Portfolio Development - Upon completion of this project the students will be able to learn: 1. Specific analytical techniques deployment based on a specific data types 2. How to use one of the most in demand data science tool R and its GUI based data mining packages 3. How to model data set with binary, categorical and continuous data

Description

The project revolves around creating a working predictive model that predict the propensity of a fraud instance given certain conditions (predictor variables). The aim of this project is to create a fully deployable predictive algorithm with the capability to predict fraud occurrence. It is expected that the individual or the team should be able to clearly explain the entire analytical and algorithm building process steps by step in a form of a presentation. The individual or the teams can leverage any statistical tool of their choice to deliver the work output.

Success Criteria – Step by Step model building presentation and a working algorithm
Data Access and Usage limitations – The users can create features out of the data if required with a valid reason for taking that route. Considering the fact that it is fairly and easy a light dataset creating too many features may reduce the chances of success.
Data – The dataset has 4k rows worth of data with 12 variables out of which 1 is response variable (Fraud Instance) remaining variable are predictor variables.