CARE-GNN

A PyTorch implementation for the CIKM 2020 paper below:
Enhancing Graph Neural Network-based Fraud Detectors against Camouflaged Fraudsters.
Yingtong Dou, Zhiwei Liu, Li Sun, Yutong Deng, Hao Peng, Philip S. Yu.
[Paper][Toolbox]

Overview

CAmouflage-REsistant Graph Neural Network (CARE-GNN) is a GNN-based fraud detector based on a multi-relation graph equipped with three modules that enhance its performance against camouflaged fraudsters.

Three enhancement modules are:

A label-aware similarity measure which measures the similarity scores between a center node and its neighboring nodes;
A similarity-aware neighbor selector which leverages top-p sampling and reinforcement learning to select the optimal amount of neighbors under each relation;
A relation-aware neighbor aggregator which directly aggregates information from different relations using the optimal neighbor selection thresholds as weights.

CARE-GNN has following advantages:

Adaptability. CARE-GNN adaptively selects best neighbors for aggregation given arbitrary multi-relation graph;
High-efficiency. CARE-GNN has a high computational efficiency without attention and deep reinforcement learning;
Flexibility. Many other neural modules and external knowledge can be plugged into the CARE-GNN;

We have integrated more than eight GNN-based fraud detectors as a TensorFlow toolbox.

Setup

You can download the project and install the required packages using the following commands:

git clone https://github.com/YingtongDou/CARE-GNN.git
cd CARE-GNN
pip3 install -r requirements.txt

To run the code, you need to have at least Python 3.6 or later versions.

Running

In CARE-GNN directory, run unzip /data/Amazon.zip and unzip /data/YelpChi.zip to unzip the datasets;
Run python data_process.py to generate adjacency lists used by CARE-GNN;
Run python train.py to run CARE-GNN with default settings.

For other dataset and parameter settings, please refer to the arg parser in train.py. Our model supports both CPU and GPU mode.

Running on your datasets

To run CARE-GNN on your datasets, you need to prepare the following data:

Multiple-single relation graphs with the same nodes where each graph is stored in scipy.sparse matrix format, you can use sparse_to_adjlist() in utils.py to transfer the sparse matrix into adjacency lists used by CARE-GNN;
A numpy array with node labels. Currently, CARE-GNN only supports binary classification;
A node feature matrix stored in scipy.sparse matrix format.

Repo Structure

The repository is organized as follows:

data/: dataset files;
data_process.py: transfer sparse matrix to adjacency lists;
graphsage.py: model code for vanilla GraphSAGE model;
layers.py: CARE-GNN layers implementations;
model.py: CARE-GNN model implementations;
train.py: training and testing all models;
utils.py: utility functions for data i/o and model evaluation.

Citation

If you use our code, please cite the paper below:

@inproceedings{dou2020enhancing,
  title={Enhancing Graph Neural Network-based Fraud Detectors against Camouflaged Fraudsters},
  author={Dou, Yingtong and Liu, Zhiwei and Sun, Li and Deng, Yutong and Peng, Hao and Yu, Philip S},
  booktitle={Proceedings of the 29th ACM International Conference on Information and Knowledge Management (CIKM'20)},
  year={2020}
}

HMY626/CARE-GNN