/SlotGAT_ICML23

Primary LanguagePythonMIT LicenseMIT

SlotGAT: Slot-based Message Passing for Heterogeneous Graphs (ICML 2023)

Code and data for our Heterogeneous Graph Neural Network method SlotGAT: Slot-based Message Passing for Heterogeneous Graphs (ICML 2023) (https://proceedings.mlr.press/v202/zhou23j)

Abstract

Heterogeneous graphs are ubiquitous to model complex data. There are urgent needs on powerful heterogeneous graph neural networks to effectively support important applications. We identify a potential semantic mixing issue in existing message passing processes, where the representations of the neighbors of a node v are forced to be transformed to the feature space of v for aggregation, though the neighbors are in different types. That is, the semantics in different node types are entangled together into node v’s representation. To address the issue, we propose SlotGAT with separate message passing processes in slots, one for each node type, to maintain the representations in their own node-type feature spaces. Moreover, in a slot-based message passing layer, we design an attention mechanism for effective slot-wise message aggregation. Further, we develop a slot attention technique after the last layer of SlotGAT, to learn the importance of different slots in downstream tasks. Our analysis indicates that the slots in SlotGAT can preserve different semantics in various feature spaces. The superiority of SlotGAT is evaluated against 13 baselines on 6 datasets for node classification and link prediction. Our code is at https://github.com/scottjiao/SlotGAT_ICML23/.

Please cite our paper if you use the code or data.

  title = 	 {SlotGAT: Slot-based Message Passing for Heterogeneous Graphs},
  author =       {Zhou, Ziang and Shi, Jieming and Yang, Renchi and Zou, Yuanhang and Li, Qing},
  booktitle = 	 {Proceedings of the 40th International Conference on Machine Learning},
  pages = 	 {42644--42657},
  year = 	 {2023},
  volume = 	 {202},
  month = 	 {23--29 Jul},
  publisher =    {PMLR}
}

Data and trained models

The data and trained models could be downloaded in the following link:

You should place the data and trained models with the same directory structure as in the google drive link above.

Scripts

To conduct experiments, you need to do the following steps.

1. cd into the sub-directory

For node classification task:

cd ./NC/methods/SlotGAT

For link prediction task:

cd ./LP/methods/SlotGAT

2. evaluate the trained model

python run_use_slotGAT_on_all_dataset.py

Then collect the results in the ./NC/methods/SlotGAT/log or ./LP/methods/SlotGAT/log directory respectively.

3. train the model

If you want to train the model, you can run the following script.

python run_train_slotGAT_on_all_dataset.py  

Data format

  • All ids begin from 0.
  • Each node type takes a continuous range of node_id.
  • node_id and node_type id are with same order. I.e. nodes with node_type 0 take the first range of node_ids, nodes with node_type 1 take the second range, and so on.
  • One-hot node features can be omited.
  • For node classification task, the node type of the target node is 0.

Note

To be consistent with the PubMed_NC, the data of PubMed_LP is re-organized, which make it different from PubMed in HGB, while other all datasets are the same with HGB. Three changes are made:

  1. The node type of 0 and 1 are swapped. Since the main type (target node type in node classification task) of PubMed_NC is 0, we swap the node type of 0 and 1 in PubMed_LP to make the main type of PubMed_LP also 0.

  2. The id of nodes are re-ordered. According to previous change, the main type of PubMed_LP is 0. We re-order the nodes of PubMed_LP to make the nodes of main type 0 take the first range of node_ids, nodes of main type 1 take the second range, and so on. For example, for type 0 the range is [0, num_of_type_0_nodes), for type 1 the range is [num_of_type_0_nodes, num_of_type_0_nodes + num_of_type_1_nodes), and so on.

  3. The corresponding src and dst node id in links.dat and test.dat are re-mapped according to the new node ids.

In summary, we only conduct node type swapping, resulted node re-ordering and link re-mapping. The node features and graph structure are not changed. Thus, the performance of SlotGAT on PubMed_LP is the same as SlotGAT on the original PubMed in HGB.

Required environment

  • python 3.10.9
  • pytorch 1.13.1
  • dgl 1.0.1+cu117
  • pytorch_geometric 2.2.0
  • cuda 11.7
  • networkx 2.8.4
  • scikit-learn 1.2.1
  • scipy 1.10.0