This repository is the codebase of a paper "A Simple and Scalable Graph Neural Network for Large Directed Graphs".
- GNN using all the combinations of aggregated features and adjacency lists in directed/undirected graphs
- A2DUG
- GNN for undirected graphs
- GNN for directed graphs
- Methods using adjacency lists as node features
The A2DUG codebase uses the following dependencies:
- python 3 (tested with 3.8)
- numpy (tested with 1.23.4)
- pytorch (tested with 1.11.0)
We recommend installing using conda. The following will install all dependencies:
git clone https://github.com/seijimaekawa/A2DUG.git
cd A2DUG
conda create --name a2dug python=3.8
conda activate a2dug
conda install pytorch==1.11.0 cudatoolkit=11.3 -c pytorch
pip install -r requirements.txt
You can run the code using the best parameter set used in our paper:
python src/main.py --model A2DUG --dataset arxiv-year
For large-scale graphs (snap-patents, pokec, and wiki), you can use --minibatch
option as follows:
python src/main.py --model A2DUG --dataset pokec --minibatch
The code saves the experimental results into experiments/
.
For methods that can input a graph as either directed or undirected (LINK, LINKX, and GloGNN++), you can specify --directed
option as follows:
python src/main.py --model LINKX --dataset arxiv-year --directed
If you do not specify the option, an input graph is used as undirected.
To reproduce the ablation study in the paper, you can run A2DUG
with --wo_direction
, --wo_undirected
, --wo_agg
, --wo_adj
, or --wo_transpose
as follows:
python src/main.py --model A2DUG --dataset arxiv-year --wo_directed
python src/main.py --model A2DUG --dataset arxiv-year --wo_undirected
python src/main.py --model A2DUG --dataset arxiv-year --wo_agg
python src/main.py --model A2DUG --dataset arxiv-year --wo_adj
python src/main.py --model A2DUG --dataset arxiv-year --wo_transpose
The hyperparameter search space for each model is listed in json files.
Also, we show the best parameter sets used in Table 2, 3, 4, 5, and 9 in the paper.
cd A2DUG
python src/main.py --model A2DUG --dataset arxiv-year --optuna
The code loads the hyperparameter search space specified in the json files. After 100 runs, the code saves the best parameter set to the folder: best parameter sets.
This framework allows users to use real-world datasets as follows:
Dataset | Nodes | Edges | Undirected Edges | Attributes | Labels | Prediction Target |
---|---|---|---|---|---|---|
cornell | 183 | 298 | 280 | 1,703 | 5 | web page catefogy |
texas | 183 | 325 | 295 | 1,703 | 5 | web page catefogy |
wisconsin | 251 | 515 | 466 | 1,703 | 5 | web page catefogy |
citeseer | 3,327 | 4,715 | 4,660 | 3,703 | 6 | research field |
cora_ml | 2,995 | 8,416 | 8,158 | 2,879 | 7 | research field |
chameleon-filtered | 890 | 13,584 | 8,904 | 2,325 | 5 | web page traffic |
squirrel-filtered | 2,223 | 65,718 | 47,138 | 2,089 | 5 | web page traffic |
genius | 421,961 | 984,979 | 922,868 | 12 | 2 | marked act. |
ogbn-arxiv | 169,343 | 1,166,243 | 1,157,799 | 128 | 40 | research field |
arxiv-year | 169,343 | 1,166,243 | 1,157,799 | 128 | 5 | publication year |
snap-patents | 2,923,922 | 13,975,788 | 13,972,547 | 269 | 5 | time granted |
pokec | 1,632,803 | 30,622,564 | 22,301,964 | 65 | 2 | gender |
wiki | 1,925,342 | 303,434,860 | 242,605,360 | 600 | 5 | total page views |
By changing --dataset [dataset name]
, users can choose a dataset.
We provide a Jupyter notebook for calculating edge homophily ratios.
We assume that all experiments are conducted with a single NVIDIA A100-PCIE-40GB.