This project implements a comprehensive solution for detecting credit card fraud using machine learning. It leverages the Kaggle Credit Card Fraud Dataset to build an end-to-end pipeline for fraud detection.
- Data Version Control (DVC): We use DVC to efficiently manage and version control our dataset and model files. This ensures reproducibility and easy collaboration.
- Serving with BentoML: Our model is deployed using BentoML, a powerful framework for serving machine learning models. This allows for seamless integration into production environments.
- XGBoost Binary Classifier: The heart of our fraud detection system is a binary classifier built using the XGBoost native library, known for its high performance and accuracy.
To run this project, follow these simple steps:
-
Clone the repository:
git clone https://github.com/andugu/credit-fraud.git
-
Install the required dependencies:
pip install -r requirements.txt
-
Use DVC to reproduce the pipeline:
dvc repro
-
Start serving the model on the default port (8000) of localhost:
bentoml serve bentoml/FraudClassifier --port 8000
To run the service on a different port, simply replace
8000
with the desired port.
The DAG of the project is as follows:
+---------+
| Prepare |
+---------+
|
|
|
+----------------+
| Feature Engine |
+----------------+
|
|
|
+---------+
| Train |
+---------+
|
|
|
+--------+
| Pack |
+--------+
|
|
|
+---------+
| Serve |
+---------+