A PyTorch implementation for the CIKM 2020 paper below:
Enhancing Graph Neural Network-based Fraud Detectors against Camouflaged Fraudsters.
Yingtong Dou, Zhiwei Liu, Li Sun, Yutong Deng, Hao Peng, Philip S. Yu.
[Paper][Toolbox]
CAmouflage-REsistant Graph Neural Network (CARE-GNN) is a GNN-based fraud detector based on a multi-relation graph equipped with three modules that enhance its performance against camouflaged fraudsters.
Three enhancement modules are:
- A label-aware similarity measure which measures the similarity scores between a center node and its neighboring nodes;
- A similarity-aware neighbor selector which leverages top-p sampling and reinforcement learning to select the optimal amount of neighbors under each relation;
- A relation-aware neighbor aggregator which directly aggregates information from different relations using the optimal neighbor selection thresholds as weights.
CARE-GNN has following advantages:
- Adaptability. CARE-GNN adaptively selects best neighbors for aggregation given arbitrary multi-relation graph;
- High-efficiency. CARE-GNN has a high computational efficiency without attention and deep reinforcement learning;
- Flexibility. Many other neural modules and external knowledge can be plugged into the CARE-GNN;
We have integrated more than eight GNN-based fraud detectors as a TensorFlow toolbox.
You can download the project and install the required packages using the following commands:
git clone https://github.com/YingtongDou/CARE-GNN.git
cd CARE-GNN
pip3 install -r requirements.txt
To run the code, you need to have at least Python 3.6 or later versions.
- In CARE-GNN directory, run
unzip /data/Amazon.zip
andunzip /data/YelpChi.zip
to unzip the datasets; - Run
python data_process.py
to generate adjacency lists used by CARE-GNN; - Run
python train.py
to run CARE-GNN with default settings.
For other dataset and parameter settings, please refer to the arg parser in train.py
. Our model supports both CPU and GPU mode.
To run CARE-GNN on your datasets, you need to prepare the following data:
- Multiple-single relation graphs with the same nodes where each graph is stored in
scipy.sparse
matrix format, you can usesparse_to_adjlist()
inutils.py
to transfer the sparse matrix into adjacency lists used by CARE-GNN; - A numpy array with node labels. Currently, CARE-GNN only supports binary classification;
- A node feature matrix stored in
scipy.sparse
matrix format.
The repository is organized as follows:
data/
: dataset files;data_process.py
: transfer sparse matrix to adjacency lists;graphsage.py
: model code for vanilla GraphSAGE model;layers.py
: CARE-GNN layers implementations;model.py
: CARE-GNN model implementations;train.py
: training and testing all models;utils.py
: utility functions for data i/o and model evaluation.
If you use our code, please cite the paper below:
@inproceedings{dou2020enhancing,
title={Enhancing Graph Neural Network-based Fraud Detectors against Camouflaged Fraudsters},
author={Dou, Yingtong and Liu, Zhiwei and Sun, Li and Deng, Yutong and Peng, Hao and Yu, Philip S},
booktitle={Proceedings of the 29th ACM International Conference on Information and Knowledge Management (CIKM'20)},
year={2020}
}