Fraudulent transactions in the banking or payment industry pose a significant challenge, resulting from illegal credit card usage without the cardholder's knowledge. Detecting fraudulent charges in real-time can help prevent financial losses and protect customer trust.
This project employs machine learning techniques to identify fraudulent transactions, specifically using unsupervised learning with Isolation Forest and Local Outlier Factor (LOF) algorithms.
The dataset used in this project contains approximately 140,000 masked credit card transactions, each represented by 15 numerical features. These features are the result of PCA transformation, and due to confidentiality reasons, we lack specific information about the attributes.
- Language:
Python
- Libraries:
scikit-learn
,pandas
,matplotlib
,numpy
,seaborn
The project follows a structured approach:
- Import the required libraries and packages.
- Open the
config.ini
file, a configuration file that can be edited to adapt to different datasets. - Read the dataset containing masked credit card transaction data.
- Perform exploratory data analysis to understand the dataset.
- Handle missing values and preprocess the data.
- Determine the contamination amount for Isolation Forest.
- Train models using Isolation Forest and Local Outlier Factor.
- Make predictions and identify fraudulent transactions.
- input: Contains dataset files and a
config.ini
configuration file. - src: The core of the project, with modularized code organized into the
engine.py
file and theml_pipeline
folder. Theml_pipeline
folder contains functions in separate Python files, used withinengine.py
. - output: Contains pre-trained models saved as
.pkl
files for future use. - lib: A reference folder containing the IPython notebooks.