I present this project with some ideas about how to deal with unbalanced data, use metrics different than the typical accuracy measure and explore the use of trees, a lot of trees, CART, random forest and XGBoost with balance and unbalanced data. All of this to get a good prediction for a very small quantity of fraud cases.
Note: All of this using R.
The dataset used is on ./data/creditCard.zip.
The code provided on the .R file will unzip and prepare the data.
In case of any problems you can download it from Kaggle Credit Card Fraud Detection