The project aims to detect frauds and its goal is to identify unusual activities or patterns. For example, check signature forgery, credit card cloning, money laundering, intentional bankruptcy declaration, etc.
The server is responsible for analyzing the dataset deeply, and from it, data manipulations were made to use only appropriate parameters in model training. We analyzed various fraud detection models, and each one had its strengths and weaknesses weighted. The models are then used by the frontend through calls.
git clone git@github.com:enzodpaiva/Deteccao-Fraude-pantanal.dev-Backend.git
Run the instructions in the tutorial.ipynb
docker run --rm -p 3000:3000 --network deteccao-fraude-pantanaldev-api_fraud_network credit_card_fraud_detection:zyieiornuowrafis
ctrl+c
Dataset used:
Keggle - Credit Card Fraud Detection
Applied manipulations:
Log10 on values Distribution of transactions according to day/hour/minute Pruning of unimportant attributes based on their Analysis of Variation All models were trained and analyzed with 30, 25, and 20 attributes, with under or over-sampling, or with SMOTE.
- Decision trees with depths 3, 4, and 5
- XGBoost
- Precision
- Recall
- Specificity
- F1 Score
- Geometric Mean
Neural networks, especially deeper ones, can achieve better metrics, but due to their complexity and black-box nature, they are particularly challenging to analyze and explain how and why a particular transaction is classified.
On the other hand, decision trees are easy to analyze and explain, and when applied in ensemble methods like XGBoost, they can achieve metrics comparable to those of neural networks.
Due to the dataset's imbalance, it is a less illustrative metric of model quality.
Highly illustrative metrics of model quality for this application as they analyze each classification group individually, thus addressing the imbalance in group composition.
Low representativeness of model quality for imbalanced datasets due to its composition with precision as one of the components.
Highly illustrative metric of model quality for this application as it normalizes the imbalance between different groups before evaluating model quality.
📝 Possibility to search for past frauds.
📝 Implement authentication and access control to ensure user security.
📝 Add support for different types of data sources for fraud detection, such as social media feeds, additional financial transaction data, etc.
📝 Integrate the application with email or messaging notification services to alert users of suspicious activities.
📝 Implement a user feedback system to collect suggestions and continuously improve the application.
📝 Perform rigorous performance testing to ensure the application can handle large volumes of data efficiently.
📝 Integrate the application with third-party systems, such as databases, to obtain additional information for fraud analysis.
Enzo Paiva |
Alexandre Shimizu |
Eduardo Lopes |
Vitor Yuske |
---|
The MIT License (MIT)
Copyright ©️ 2023 - Data Wizard - Back-end