Data Wizard - Server

Topics

Project Description

The project aims to detect frauds and its goal is to identify unusual activities or patterns. For example, check signature forgery, credit card cloning, money laundering, intentional bankruptcy declaration, etc.

The server is responsible for analyzing the dataset deeply, and from it, data manipulations were made to use only appropriate parameters in model training. We analyzed various fraud detection models, and each one had its strengths and weaknesses weighted. The models are then used by the frontend through calls.

How to Run the Application ▶️

In the terminal, clone the project:

git clone git@github.com:enzodpaiva/Deteccao-Fraude-pantanal.dev-Backend.git

Create a .env file in the project root based on .env.example

Run the instructions in the tutorial.ipynb

Run the application using Docker on the API network

docker run --rm -p 3000:3000 --network deteccao-fraude-pantanaldev-api_fraud_network credit_card_fraud_detection:zyieiornuowrafis

Shutdown the application using Docker

ctrl+c

Dataset Manipulation

Dataset used:

Keggle - Credit Card Fraud Detection

Applied manipulations:

Log10 on values Distribution of transactions according to day/hour/minute Pruning of unimportant attributes based on their Analysis of Variation All models were trained and analyzed with 30, 25, and 20 attributes, with under or over-sampling, or with SMOTE.

Analyzed Models

Decision trees with depths 3, 4, and 5
XGBoost

Used Metrics

Precision
Recall
Specificity
F1 Score
Geometric Mean

Insights from Analysis

Models :

Neural networks, especially deeper ones, can achieve better metrics, but due to their complexity and black-box nature, they are particularly challenging to analyze and explain how and why a particular transaction is classified.

On the other hand, decision trees are easy to analyze and explain, and when applied in ensemble methods like XGBoost, they can achieve metrics comparable to those of neural networks.

Metrics:

Precision

Due to the dataset's imbalance, it is a less illustrative metric of model quality.

Recall and Specificity

Highly illustrative metrics of model quality for this application as they analyze each classification group individually, thus addressing the imbalance in group composition.

F1 Score

Low representativeness of model quality for imbalanced datasets due to its composition with precision as one of the components.

Geometric Mean

Highly illustrative metric of model quality for this application as it normalizes the imbalance between different groups before evaluating model quality.

Languages, Dependencies, and Libraries Used 📚

Future Improvements We Aim to Implement

📝 Possibility to search for past frauds.

📝 Implement authentication and access control to ensure user security.

📝 Add support for different types of data sources for fraud detection, such as social media feeds, additional financial transaction data, etc.

📝 Integrate the application with email or messaging notification services to alert users of suspicious activities.

📝 Implement a user feedback system to collect suggestions and continuously improve the application.

📝 Perform rigorous performance testing to ensure the application can handle large volumes of data efficiently.

📝 Integrate the application with third-party systems, such as databases, to obtain additional information for fraud analysis.

Developers

_{Enzo Paiva}	_{Alexandre Shimizu}	_{Eduardo Lopes}	_{Vitor Yuske}

Licença

The MIT License (MIT)

enzodpaiva/Fraud-Detection-Server-pantanal.dev