GraphGen: A Scalable Approach to Domain-agnostic Labeled Graph Generation

This repository is the official PyTorch implementation of GraphGen, a generative graph model using auto-regressive model.

Nikhil Goyal, Harsh Vardhan Jain, and Sayan Ranu, GraphGen: A Scalable Approach to Domain-agnostic Labeled Graph Generation, in WWW, 2020.

Most of the code has been adapted from GraphRNN

Installation

We recommend anaconda distribution for Python and other packages. The code has been tested over PyTorch 1.2.0 version with Python 3.7.0.

Pytorch and pip installation in conda. Change cuda version as per your GPU hardware support.

conda install pip pytorch=1.2.0 torchvision cudatoolkit=10.1 -c pytorch

Then install the other dependencies.

pip install -r requirements.txt

Boost (Version >= 1.70.0) and OpenMP are required for compling C++ binaries. Run build.sh script in the project's root directory.

./build.sh

Test run

python3 main.py

Code description

main.py is the main script file, and specific arguments are set in args.py.
train.py includes training iterations framework and calls generative algorithm specific training files.
datasets/preprocess.py and util.py contain preprocessing and utility functions.
datasets/process_dataset.py reads graphs from various formats.

GraphGen:

dfscode/dfs_code.cpp calculates the minimum DFS code required by GraphGen. It is adapted from kaviniitm. dfscode/dfs_wrapper.py is a python wrapper for the cpp file.
graphgen/model.py and graphgen/data.py contain the model and DataLoader class respectively.
graphgen/train.py contains the core loss evaluation and generation algorithm for GraphGen

For baseline models:

We extend DeepGMG model for labeled graphs based on the DGL (Deep Graph Library). DeepGMG specific files are contained in baselines/dgmg/ folder
We extended DeepGMG model for labeled graphs based upon GraphRNN. GraphRNN specfic code is contained in baselines/graph_rnn/ folder

Parameter setting:

All the input arguments and hyper parameters setting are included in args.py.
Set args.note to specify which generative algorithm (GraphGen, GraphRNN or DeepGMG) to run.
For example, args.device controls which device (GPU) is used to train the model, and args.graph_type specifies which dataset is used to train the generative model.
See the documentation in args.py for more detailed descriptions of all fields.

Outputs

There are several different types of outputs, each saved into a different directory under a path prefix. The path prefix is set at args.dir_input. Suppose that this field is set to '':

tensorboard/ contains tensorboard event objects which can be used to view training and validation graphs in real time.
model_save/ stores the model checkpoints
tmp/ stores all the temporary files generated during training and evaluation.

Evaluation

The evaluation is done in evaluate.py, where user can choose which model to evaluate. Change the ArgsEvaluate class fields accordingly.
We use GraphRNN implementation for structural metrics.
NSPDK is evaluated using EDeN python package.
metrics/isomorph.cpp and metrics/unique.cpp contain C++ function call to boost subgraph isomorphism algorithm to evaluate novelty and uniqueness.

To evaluate, run

python3 evaluate.py