This repository is the official PyTorch implementation of GraphGen, a generative graph model using auto-regressive model.
Nikhil Goyal, Harsh Vardhan Jain, and Sayan Ranu, GraphGen: A Scalable Approach to Domain-agnostic Labeled Graph Generation, in WWW, 2020.
Most of the code has been adapted from GraphRNN
We recommend anaconda distribution for Python and other packages. The code has been tested over PyTorch 1.2.0 version with Python 3.7.0.
Pytorch and pip installation in conda. Change cuda version as per your GPU hardware support.
conda install pip pytorch=1.2.0 torchvision cudatoolkit=10.1 -c pytorch
Then install the other dependencies.
pip install -r requirements.txt
Boost (Version >= 1.70.0) and OpenMP are required for compling C++ binaries. Run build.sh
script in the project's root directory.
./build.sh
python3 main.py
main.py
is the main script file, and specific arguments are set inargs.py
.train.py
includes training iterations framework and calls generative algorithm specific training files.datasets/preprocess.py
andutil.py
contain preprocessing and utility functions.datasets/process_dataset.py
reads graphs from various formats.
GraphGen:
dfscode/dfs_code.cpp
calculates the minimum DFS code required by GraphGen. It is adapted from kaviniitm.dfscode/dfs_wrapper.py
is a python wrapper for the cpp file.graphgen/model.py
andgraphgen/data.py
contain the model and DataLoader class respectively.graphgen/train.py
contains the core loss evaluation and generation algorithm for GraphGen
For baseline models:
- We extend DeepGMG model for labeled graphs based on the DGL (Deep Graph Library). DeepGMG specific files are contained in
baselines/dgmg/
folder - We extended DeepGMG model for labeled graphs based upon GraphRNN. GraphRNN specfic code is contained in
baselines/graph_rnn/
folder
Parameter setting:
- All the input arguments and hyper parameters setting are included in
args.py
. - Set
args.note
to specify which generative algorithm (GraphGen, GraphRNN or DeepGMG) to run. - For example,
args.device
controls which device (GPU) is used to train the model, andargs.graph_type
specifies which dataset is used to train the generative model. - See the documentation in
args.py
for more detailed descriptions of all fields.
There are several different types of outputs, each saved into a different directory under a path prefix. The path prefix is set at args.dir_input
. Suppose that this field is set to ''
:
tensorboard/
contains tensorboard event objects which can be used to view training and validation graphs in real time.model_save/
stores the model checkpointstmp/
stores all the temporary files generated during training and evaluation.
- The evaluation is done in
evaluate.py
, where user can choose which model to evaluate. Change theArgsEvaluate
class fields accordingly. - We use GraphRNN implementation for structural metrics.
- NSPDK is evaluated using EDeN python package.
metrics/isomorph.cpp
andmetrics/unique.cpp
contain C++ function call to boost subgraph isomorphism algorithm to evaluate novelty and uniqueness.
To evaluate, run
python3 evaluate.py