This repository contains the notebook for reproducing the fraud detection analysis covered in this blog series. To summarize, This analysis uses Neo4j and Graph Data Science (GDS) to explore an anonymized data sample from a Peer-to-Peer (P2P) payment platform. The notebook is split up into the following sections (mirroring the blog series) to cover various stages of the graph data science workflow:
- Part 1: Exploring Connected Fraud Data
- Part 2: Resolving Fraud Communities using Entity Resolution and Community Detection
- Part 3: Recommending Suspicious Accounts With Centrality & Node Similarity
- Part 4: Predicting Fraud Risk Accounts with Machine Learning
To run the notebook you will need a copy of the dataset which is available in the form of a neo4j dump file in this folder. The folder also contains a readme with more details on the dataset and directions for how to load the data into neo4j if you are unfamiliar with load process. The folder contains the ODC-BY license for the dataset as well (seperate from the license in this repository).