/NebulaGraph-Fraud-Detection-GNN

An example project for training a GraphSAGE Model, and setup a Real-time Fraud Detection Web Service(Frontend and Backend) with NebulaGraph Database and DGL.

Primary LanguageJupyter NotebookApache License 2.0Apache-2.0

Arch and Flow

GraphSAGE_FraudDetection

Model Training

Check https://github.com/wey-gu/NebulaGraph-Fraud-Detection-GNN/tree/main/notebooks/Train_GraphSAGE.ipynb for details.

  • Input: Graph of Historical Yelp Reviews
  • Output: a GraphSAGE Node Classification Model, could be inductive
                     ┌──────────────────────────────────────────────┐                     
                     │   ┌──────────────────────────────────────┐   │                     
                     │   │     Graph of Historical Reviews      │   │                     
                     │   └──────────────────────────────────────┘   │                     
                     │                      .─.              .      │                     
                     │                     (   )◀───────────( )     │                     
                     │                      `─'              '      │                     
                     │     .       .─.       ╲             ◁        │                     
                     │    ( )◀────(   )       ╲        .  ╱         │                     
                     │     '       `─'         ╲      ( )╱          │                     
                     │     ╲       ◀            ╲      '            │                     
                     │      ╲  .  ╱              ◁                  │                     
                     │       ◀( )╱               .─.         .─.    │                     
                     │         '                (   )◀──────(   )   │                     
                     │                           `─'         `─'    │                     
                     │                                              │                     
                     └──────────────────────────────────────────────┘                     
                                             ┃   (Nebula-DGL: NebulaLoader)                                         
                                             ▼                                            
┌────────────────────────────────────────────────────────────────────────────────────────┐
│ ┌────┐           ┌ ─ ─ ─ ─ ─ ─ ─ ─               ┌ ─ ─ ─ ─ ─ ─ ─ ─                     │
│ │GNN │                            │                               │                    │
│ └────┘           │                               │                                     │
│                                   │                               │                    │
│                  │           ◀                   │           ◀                         │
│                      .   .  ╱.─.  │                  .   .  ╱.─.  │                    │
│                  │  ( )◀───╱(   )                │  ( )◀───╱(   )                      │
│ .       .─.          '╱  '   `─'  │  ┌────────┐      '╱  '   `─'  │        .       .─. │
│( )◀────(   )     │   .       .─.     │ ReLU  ╱│  │   .       .─.          ( )◀────(   )│
│ '       `─'         ( )◀────(   ) │  │      ╱ │     ( )◀────(   ) │        '       `─' │
│ ╲       ◀   ══▶  │   '       `─'     │     ╱  │  │   '       `─'   ... ══▶ ╲       ◀   │
│  ╲  .  ╱             ╲       ◀    │  │─────   │      ╲       ◀    │         ╲  .  ╱    │
│   ◀( )╱          │    ╲  .  ╱        └────────┘  │    ╲  .  ╱                ◀( )╱     │
│     '                  ◀( )╱      │                    ◀( )╱      │            '       │
│                  │       '                       │       '                             │
│                      .       .─.  │                  .       .─.  │                    │
│                  │  ( )◀────(   )                │  ( )◀────(   )                      │
│                      '       `─'  │                  '       `─'  │                    │
│                  │   ╲       ◀                   │   ╲       ◀                         │
│                       ╲  .  ╱     │                   ╲  .  ╱     │                    │
│                  │     ◀( )╱                     │     ◀( )╱                           │
│                          '        │                      '        │                    │
│                  └ ─ ─ ─ ─ ─ ─ ─ ─               └ ─ ─ ─ ─ ─ ─ ─ ─                     │
└────────────────────────────────────────────────────────────────────────────────────────┘
                                            ┃                                             
                                            ▼                                             
                          ┌──────────────────────────────────┐                            
                          │                 Λ                │                            
                      ┌───┴─┐  GNN Model   ╱ ╲            ┌──┴──┐                         
                      ├─────┤             ╱   ╲           ├─────┤                         
                      ├─────┼────────────▶    ───────────▶├─────┤                         
                      ├─────┤             ╲   ╱           ├─────┤                         
                      └───┬─┘              ╲ ╱            └──┬──┘                         
                          │                 V                │                            
                          └──────────────────────────────────┘      

Online Fraud Inference System

Backend

  • Input: a new review
  • Output: is_fraud prediction
  • Flow: 0. A review will be inserted to NebulaGraph
    1. A SubGraph Query will be called
    2. SubGraph will be sent to Inference API
    3. Inference API will predict its is_fraud label on the trained model
      ┌─────────────────────┐                          ┌─────────────────┐      
      │                     │                          │                 │
─────▶│ Transaction Record  ├──────2. Fraud Risk ─────▶│  Inference API  │◀────┐
      │                     │◀────Prediction with ─────┤                 │     │
      │                     │        Sub Graph.        │                 │     │
      └─────────────────────┘                          └─────────────────┘     │
           │           ▲                                        │              │
           │           │                                        │              │
       0. Insert   1. Get New                              3.req: Node         │
         Record.   Record Sub                            Classification.       │
           │         Graph.                                     │              │
           ▼           │                                        │              │
┌──────────────────────┴─────────────────┐ ┌────────────────────┘      3.resp: │
│┌──────────────────────────────────────┐│ │                          Predicted│
││   Graph of Historical Transactions   ││ │                             Risk. │
│└──────────────────────────────────────┘│ │                                   │
│                   .─.              .   │ │                                   │
│                  (   )◀───────────( )  │ │                                   │
│                   `─'              '   │ │      ┌──────────────────────┐     │
│  .       .─.       ╲             ◁     │ │      │ GNN Model Λ          │     │
│ ( )◀────(   )       ╲           ╱      │ │  ┌───┴─┐        ╱ ╲      ┌──┴──┐  │
│  '       `─'         ╲       . ╱       │ │  ├─────┤       ╱   ╲     ├─────┤  │
│  ╲       ◀            ╲     ( )        │ └─▶├─────┼─────▶▕     ─────├─────┤──┘
│   ╲  .  ╱              ◁     '         │    ├─────┤       ╲   ╱     ├─────┤   
│    ◀( )╱               .─.         .─. │    └───┬─┘        ╲ ╱      └──┬──┘   
│      '                (   )◀──────(   )│        │           V          │      
│                        `─'         `─' │        └──────────────────────┘      
└────────────────────────────────────────┘                                      

Frontend

As the review request being sent to Graph Database and Inference API, when fraud predict is responded to the Inference API caller, in parallel, the result will be broadcast to Real Time Fraud Monitor Dashboards, too.

The dashbard are tables subscribing to the flow of reviews sending in, and when some of the records are highlighted with hi risk in fraud, corresponding party will be notified and inovlved for follow-up actions.

Demo Video 👉🏻:

GraphSAGE_FraudDetection_demo.mov
         ┌────────────────────────────────────────────────────────────────────┐         
         │   ┌──────────────────────────────────────────────────────────┐     │         
         │   │        Real-Time Online Fraud Monitor Web Service        │     │         
         │   └──────────────────────────────────────────────────────────┘     │         
         │                                                                    │         
         │   ┌────┬────┬──────┬────┬────┬────┬────┬────┬────┬────┬──────┐     │         
         │   │    │    │      │    │    │    │    │    │    │    │  OK  │     │         
         │   ├────┼────┼──────┼────┼────┼────┼────┼────┼────┼────┼──────┤     │         
         │   │    │    │      │    │    │    │    │    │    │    │  OK  │     │         
         │   ├────┼────┼──────┼────┼────┼────┼────┼────┼────┼────┼──────┤     │         
         │   │    │    │      │    │    │    │    │    │    │    │ NOK  │     │         
         │   └────┴────┴──────┴────┴────┴────┴────┴────┴────┴────┴──────┘     │         
         └─────────────────────────────────▲──────────────────────────────────┘         
                                           ┃                                            
┌───────────────────────┐                  ┃                                            
│  New Review/Requests  │                  ┃                                            
│Generated Continuously │                  ┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓               
└───────────────────────┘                                               ┃               
            │                                                           ┃               
            │                                                           ┃               
            │ ┌─────────────────────┐                          ┌────────┻────────┐      
            │ │                     │                          │                 │      
            │ │                     │                          │                 │      
            └▶│ Transaction Record  ├──────2. Fraud Risk ─────▶│  Inference API  │◀────┐
              │                     │◀────Prediction with ─────┤                 │     │
              │                     │        Sub Graph         │                 │     │
              └─────────────────────┘                          └─────────────────┘     │
...

Graph Model and Data Set

We will leverage Yelp-Fraud dataset comes from Enhancing Graph Neural Network-based Fraud Detectors against Camouflaged Fraudsters.

There will be one type of node and three types of edges:

  • Node: review on restaurant, hotel. With Label and Feature Properties:
    • is_fraud to be the label
    • 32 features being feature-engineered
  • Edge: in 3-typed between the nodes. Without Properties:
    • R-U-R: share same reviewer, named shares_user_with
    • R-S-R: share same rate for same object, named shares_restaurant_rating_with
    • R-T-R: share same review submitting month for same object, named shares_restaurant_in_one_month_with

Before the project, I made the playground to ingest the Yelp Data Graph into NebulaGraph, see more from https://github.com/wey-gu/nebulagraph-yelp-frauddetection.

Playground Setup with Data Ingestion

You could quickly run the following lines to make it ready:

# Deploy NebulaGraph for Playground
curl -fsSL nebula-up.siwei.io/install.sh | bash

# Clone the data downloader repo
git clone https://github.com/wey-gu/nebulagraph-yelp-frauddetection && cd nebulagraph-yelp-frauddetection

# Install requirement, then download the data ready for NebulaGraph
python3 -m pip install -r requirements.txt
python3 data_download.py

# Import it to NebulaGraph
docker run --rm -ti \
 --network=nebula-net \
 -v ${PWD}/yelp_nebulagraph_importer.yaml:/root/importer.yaml \
 -v ${PWD}/data:/root \
 vesoft/nebula-importer:v3.1.0 \
 --config /root/importer.yaml

Then refer to notebooks/* for Training Model and the Real-time Fraud Detection Web Service itself, and refer to src/* for the Real-time Fraud Detection Web Service reference implementation.

Playground of Real-Time Fraud Monitor

Follow this, you should be able to run the Real-time Fraud Detection Web Service with my trained model being loaded.

Get your machine's IP (not the 127.0.0.1), say it's 10.0.0.5.

export MY_IP="10.0.0.5"

Run Backend:

git clone https://github.com/wey-gu/NebulaGraph-Fraud-Detection-GNN.git
cd NebulaGraph-Fraud-Detection-GNN/src

# ADD MY_IP into CORS & Frontend file, nginx.conf
sed -i "s/nebula-demo.siwei.io/$MY_IP/g" fraudd_backend/fraudd/__init__.py
sed -i "s/nebula-demo.siwei.io/$MY_IP/g" fraudd_frontend/src/components/Table.vue
sed -i "s/nebula-demo.siwei.io/$MY_IP/g" nginx.conf

# install dep of backend
python3 -m pip install -r requirements.txt

export NG_ENDPOINTS="127.0.0.1:9669";
export FLASK_ENV=development;
export FLASK_APP=wsgi;

# run backend
cd fraudd_backend

python3 -m flask run --reload --host=0.0.0.0
# verify
$ curl localhost:5000/api
{
  "status": "ok"
}

From another terminal, build frontend:

cd NebulaGraph-Fraud-Detection-GNN/src
cd fraudd_frontend

# sudo apt install npm

npm install
npm run build

From another terminal, run Nginx:

cd NebulaGraph-Fraud-Detection-GNN/src
docker-compose up -d
# end-to-end verify backend
curl -X POST localhost:15000/api/add_review \
        -d '{"vertex_id": "2049"}' \
        -H 'Content-Type: application/json'
# return value
{
  "is_fraud": false
}

From web browser 👉🏻 http://10.0.0.5:8080/

You could check my demo: http://nebula-demo.siwei.io:8080