Data directory explanation

all_noniso_graphs: All the non-isomorphic graphs with nodes in range [2, 9] in a specific format. To convert the to nx objects, use the noniso_graphs_to_nx function.
patterns: all pattern files are in the data/patterns directory.
- triangle: named p_htw3_3_3_17 in the directory.
- star: named p_htw3_3_3_16 in the directory.
- rectangle with an edge: named p_htw3_4_5_3 in the directory.
- pentagon: named p_htw3_5_5_19 in the directory.
- pentagon with an edge: named p_htw3_5_5_20 in the directory.
datasets: all data graphs are in the corresponding data/DATASET_NAME directory, including
- g8_graphs
- ogbg_molhiv
- zinc
- qm9
forbidden_minors: The 5 forbidden minor graphs of graphs with treewidth=3 are in the data/forbidden_minors directory.

Note: The g8_graphs (called N8Graphs in the paper) has been concluded in the data/all_noniso_graphs directory. Other datasets can be downloaded from public urls using the following command:

python -m gmatch.preprocessing -o generate --dataset ogbg_molhiv # download the ogbg_molhiv dataset to the data/ogbg_molhiv/ directory.

Install

To run the code, editable installation is required.

cd /repo_root_directory
pip install -e . # install the code as a module in an editable mode.
chmod +x ./src/gmatch/subcounting/SubgraphMatching/build/matching/SubgraphMatching.out

Preprocess

Prepare graphs (download datasets)

Prepare graphs aims to download other three public graph datasets. The download relies on the PyG and OGB library.

python -m gmatch.preprocessing -o generate --dataset DATASET_NAME

This will downloads all graph files of the DATASET_NAME dataset in .graph format in data/DATASET_NAME directory.

Process graphs

Process graphs aims to generate subcount labels for LPP and other baselines to train and test their performance. This step will require the designation of both a dataset and a pattern.

python -m gmatch.preprocessing -o process --dataset DATASET_NAME --pattern_name PATTERN_NAME

The processing mainly consists of two phases:

Count number of the pattern in each graph as train/test labels.
Split graphs into train/val/test sets, and build a index file used in the WholeGraphSampler sampler (used by LPP model).

Train & test

Train and test the model with a dataset and a pattern, e.g.,

python main.py --dataset g8_graphs --pattern p_htw3_3_3_16 --model cnn --epoch 1 --batch_size 4 --device cuda:0

Convert datasets formats

Convert datasets formats aims to convert the graphs in data/DATASET_NAME directory to the data formats required by other baselines, including GIN/GCN, PPGN, and LRP. For example, if we want to validate the performance of LRP for substructure counting, we need to:

Clone the code of LRP in the directory which the LPP lies in.
Convert a dataset to LRP's input format.
Run the LRP code.

python -m gmatch.converter -m model_name -d dataset -p pattern # model_name can be GIN, PPGN, LRP.

Show statistics of counts for a dataset and a pattern

The statistics figures will be saved in the figures/ directory.

python -m gmatch.statistics -d dataset -p pattern

xiawenwen49/LPP-code