Open Fraud Detection Kit

Welcome to the Open Fraud Detection Kit project. This project is aimed at providing an open-source solution for managing and updating fraud detection AI models and toolkit.

Author: Tianyi Lu

Open Fraud Detection Kit
To test the functionality
- Make GET request to http://127.0.0.1:5000/predict

Getting Started

These instructions will get you a copy of the project up and running on your local machine for development and testing purposes.

Prerequisites

What things you need to install the software and how to install them:

You need Python < 3.11 (We used 3.7. Guide to build older versions of Python
Clone the repo git clone
Download model from google drive name it model.pkl and put it the root of the project
Install libraries for the project from requirements.txt pip install -r requirements.txt
Run Flask server python app.py
Finish!

Usage

This section is for providing examples of how your project can be used.

Navigate to the Macros directory to access the Alteryx macros.
Use the macros in your Alteryx workflows as necessary.
The 'Samples' directory provides examples of how to use the macros.
The 'TestMacros' directory contains macros that are used for testing purposes.

Contributing

Please read CONTRIBUTING.md for details on our code of conduct, and the process for submitting pull requests to us.

Versioning

We use SemVer for versioning. For the versions available, see the tags on this repository.

Acknowledgments

Thanks to Alteryx for providing the platform that this project is built upon.
All contributors who have helped to evolve this project.

Please note that this is a basic structure for a README file. Depending on your project, you might want to add more sections, for example, about the project's structure, background, or the technology stack used.

To run

Installation

python setup.py install

Requirements

* python 3.6, 3.7
* tensorflow>=1.14.0,<2.0
* numpy>=1.16.4
* scipy>=1.2.0
* networkx<=1.11

Datasets

DBLP

We uses the pre-processed DBLP dataset from Jhy1993/HAN You can run the FdGars, Player2Vec, GeniePath and GEM based on the DBLP dataset. Unzip the archive before using the dataset:

cd dataset
unzip DBLP4057_GAT_with_idx_tra200_val_800.zip

Example dataset

We implement example graphs for SemiGNN, GAS and GEM in data_loader.py. Because those models require unique graph structures or node types, which cannot be found in opensource datasets.

Yelp dataset

For GraphConsis, we preprocessed Yelp Spam Review Dataset with reviews as nodes and three relations as edges.

The dataset with .mat format is located at /dataset/YelpChi.zip. The .mat file includes:

net_rur, net_rtr, net_rsr: three sparse matrices representing three homo-graphs defined in GraphConsis paper;
features: a sparse matrix of 32-dimension handcrafted features;
label: a numpy array with the ground truth of nodes. 1 represents spam and 0 represents benign.

The YelpChi data preprocessing details can be found in our CIKM'20 paper. To get the complete metadata of the Yelp dataset, please email to ytongdou@gmail.com for inquiry.

User Guide

Running the example code

You can find the implemented models in algorithms directory. For example, you can run Player2Vec using:

python Player2Vec_main.py

You can specify parameters for models when running the code.

Running on your datasets

Have a look at the load_data_dblp() function in utils/utils.py for an example.

In order to use your own data, you have to provide:

adjacency matrices or adjlists (for GAS);
a feature matrix
a label matrix then split feature matrix and label matrix into testing data and training data.

You can specify a dataset as follows:

python xx_main.py --dataset your_dataset

or by editing xx_main.py

The structure of code

The repository is organized as follows:

algorithms/ contains the implemented models and the corresponding example code;
base_models/ contains the basic models (GCN);
dataset/ contains the necessary dataset files;
utils/ contains:
- loading and splitting the data (data_loader.py);
- contains various utilities (utils.py).

Implemented Models

Model	Paper	Venue	Reference
SemiGNN	A Semi-supervised Graph Attentive Network for Financial Fraud Detection	ICDM 2019	BibTex
Player2Vec	Key Player Identification in Underground Forums over Attributed Heterogeneous Information Network Embedding Framework	CIKM 2019	BibTex
GAS	Spam Review Detection with Graph Convolutional Networks	CIKM 2019	BibTex
FdGars	FdGars: Fraudster Detection via Graph Convolutional Networks in Online App Review System	WWW 2019	BibTex
GeniePath	GeniePath: Graph Neural Networks with Adaptive Receptive Paths	AAAI 2019	BibTex
GEM	Heterogeneous Graph Neural Networks for Malicious Account Detection	CIKM 2018	BibTex
GraphSAGE	Inductive Representation Learning on Large Graphs	NIPS 2017	BibTex
GraphConsis	Alleviating the Inconsistency Problem of Applying Graph Neural Network to Fraud Detection	SIGIR 2020	BibTex
HACUD	Cash-Out User Detection Based on Attributed Heterogeneous Information Network with a Hierarchical Attention Mechanism	AAAI 2019	BibTex

Model Comparison

Model	Application	Graph Type	Base Model
SemiGNN	Financial Fraud	Heterogeneous	GAT, LINE, DeepWalk
Player2Vec	Cyber Criminal	Heterogeneous	GAT, GCN
GAS	Opinion Fraud	Heterogeneous	GCN, GAT
FdGars	Opinion Fraud	Homogeneous	GCN
GeniePath	Financial Fraud	Homogeneous	GAT
GEM	Financial Fraud	Heterogeneous	GCN
GraphSAGE	Opinion Fraud	Homogeneous	GraphSAGE
GraphConsis	Opinion Fraud	Heterogeneous	GraphSAGE
HACUD	Financial Fraud	Heterogeneous	GAT

Himoda/open-fraud-detection-kit