Welcome to this repository where I used Apache Spark for distributed processing and XGBoost for fraud detection. The data comes from here. I Used only the "Base.csv" to test my model. The classifiaction model achieves 85% AreaUnderROC.
rezeroworld/Bank-Fraud-Detection-PySpark
Using Apache Spark to detect frauds in Python
Jupyter Notebook