/BigDataCreditCardFraudDetection

Using Machine Learning algorithms Perceptron, Logistic Regression, and Random Forest in PySpark and Jupyter Notebooks to detect fradulent transactions.

Primary LanguageJupyter Notebook

Big Data Approach for Credit Card Fraud Detection

Course Project for CS6502 - Applied Big Data and Visualization

Title - Big Data Approach for Credit Card Fraud Detection

Technologies Utilised:

  • PySpark
  • Google Colab Notebooks
  • Microsoft Power BI

Overview

This Big Data project focused on credit card fraud detection. It was developed as part of the course "Applied Big Data and Visualizations" (CS6502) at the University of Limerick, taught by Dr. Andrew Ju.

Description

Credit card fraud is a pervasive issue, escalating in frequency over recent years. Our project endeavours to combat this menace by harnessing the power of big data analytics and machine learning. By scrutinizing transactional data and employing advanced algorithms, we aim to unearth patterns and anomalies indicative of fraudulent activity in real time. Through this initiative, we aspire to fortify the security of financial transactions and curtail the proliferation of fraudulent practices. The output from our Project would be predicting whether a transaction is fraudulent or not fraudulent, based on the historical data which is used to train the machine learning models - Perceptron, Logistic Regression, Random Forest. We utilized PySpark within Notebooks on Google Colab. By harnessing the power of in-memory processing, distributed computing, fault tolerance, and advanced analytics, we aimed to enhance the accuracy and efficiency of fraud detection. Our project’s goal is to safeguard financial institutions and consumers against malicious activities.

Dataset

Dataset Link

Notebook Contents

The notebook covers various stages of the project, including data loading, exploratory data analysis (EDA), data preprocessing, model training, and evaluation. It showcases techniques for handling imbalanced datasets, data cleaning, feature engineering, and model evaluation

Notebook Link

Notebook Readme Link

Power BI Dashboard

Our Power BI dashboard offers a comprehensive analysis of credit card transactions aimed at detecting fraudulent activities. Through a series of visualizations, we delve into various aspects of transaction data to uncover patterns and anomalies indicative of potential fraud. The dashboard provides insights across multiple dimensions, including:

Power BI Dashboard Link

Key Takeaways

  • The incorporation of large-scale data analytics and machine learning tools increased the efficiency of detection and monitoring through our project.
  • Undersampling performs well for model training.
  • The Perceptron classifier gives a marginally higher recall value. Hence utilizing it is beneficial for reducing False Negatives.
  • Visualizations aid in making data-driven decisions and visualizing the dataset.
  • Continuous vigilance and adaptation are essential in combating evolving fraud tactics.