/Dealing_with_Imbalanced_Data

This project is about how you can deal with imbalanced data and which performance metrics' particularly important compared to usual practices fairly balanced data.

Primary LanguageJupyter NotebookApache License 2.0Apache-2.0

Kaggle - Credit Card Fraud Detection

This repository is IPython Notebook for Kaggle Dataset, https://www.kaggle.com/mlg-ulb/creditcardfraud

  • The aim of this project is detecting fraudulent or non-fraudulent transactions while dealing with imbalanced data. To achieve this, various supervised learning algorithms will be used and the results will be compared.

  • Imbalanced data refers to classification problems based on the binary class inequality. There are several methods for dealing with this problem like Re-Sampling, Generate Synthetic Samples, Anomaly Detection Methods or performance metrics instead of accuracy results.

  • In this project, the undersampling method will be implemented to the majority class and performance metrics such as Precision, Recall, F1 Score and AUC and some anomaly detection methods like one-class SVM and Neural Network will be used to find the best algorithm which highly predicted fraudulent or non-fraudulent transactions.

Dataset can be found in below link

https://www.kaggle.com/mlg-ulb/creditcardfraud

The project has 4 main topics:

  1. Data Exploration
  2. Hyperparameter Optimisation
  3. Model Building
  4. Comparing Performance Metrics

Dependencies