Label-GCN
A variation of GCN that allows the model to learn from labelled nodes in a graph. Paper available at https://arxiv.org/abs/2104.02153.
Overview
The implementation of Label-GCN relies on the functionality available through Keras and Stellargraph. The Stellargraph library has been
modified, with the main logic of Label-GCN contained in the file label_gcn.py
located at stellargraph/layer/label_gcn.py
.
This modified Stellargraph library is available at https://github.com/cbellei/stellargraph and
is used as a submodule in this project (see next section).
Unfortunately, Tensorflow does not easily support the use of sparse tensors within the tf.Linalg
package (see here tensorflow/tensorflow#27380 for some details); this
has the effect that currently the implementation provided in this project is inefficient for large graphs (such as the Elliptic dataset).
Installation
- Tested with Python 3.6
- Clone the repository and add the Stellargraph submodule, modified with the addition of Label-GCN
git clone https://github.com/cbellei/LabelGCN.git
cd LabelGCN
git submodule init
git submodule update
- Set up the environment (Anaconda):
conda create -n LabelGCN python=3.6
conda activate LabelGCN
pip install -r requirements.txt
cd stellargraph
pip install -e .
Datasets
The CORA, Citeseer and Pubmed datasets are available through the Stellargraph library. The Elliptic dataset is
available at https://www.kaggle.com/ellipticco/elliptic-data-set. This project expects the Elliptic dataset to be located under a directory named elliptic_bitcoin_dataset
.
Running the experiments
The transductive experiments of Tables 3 and 4 can be produced running the file experiments_transductive.py
. The dataset, number of random states
and number of runs for each random state are set by the flags ds
, ns
and nr
respectively.
It is advisable to run with ds=cora
, ns=1
and nr=1
for the quickest run. This would result in the following command:
python src/experiments_transductive.py -ds cora -ns 1 -nr 1
The inductive experiment of Table 5 can be produced running the file experiments_inductive.py
. Three flags can be set in this case: ns
determines the number of random states, as before, while nr1
and nr2
the number of runs for the standalone classical machine
learning models and for the classical machine learning models with the addition of the GCN/LabelGCN embeddings. In the latter case,
for each set of embeddings (set via ns
), a number of runs nr2
is performed. For this experiment, the quickest run results from the command:
python src/experiments_inductive.py -ns 1 -nr1 1 -nr2 1
NOTE: All runs involving the Elliptic dataset are computationally demanding. Producing Tables 4 and 5 of the paper required running on a server for several hours.