An Attributed Data Model for Graph Learning Algorithms
Junteng Jia, and Austin R. Benson
arXiv:2101.07730, 2021.
Our paper propose a generative model that unifies traditional label propagation algorithm and graph neural networks. Our model assume the node features and labels are jointly sampled from a Gaussian Markov random field (MRF). Therefore we can derive discriminative learning algorithms by computing the expectation of outcomes of the unknown nodes conditioned on different observations. In particular, the expectation of the unkonwn outcomes
- conditioned on the observed outcomes ⟶ Label Propagation.
- conditioned on all the features ⟶ Linear Graph Convolution.
- conditioned on the observed outcomes and the features ⟶ Linear Graph Convolution + Residual Propagation.
Our code is tested under in Julia 1.4.1, you can install all dependent packages by running.
julia env.jl
Our code implement the three algorithms outlined above, as well as some baselines and variants. In order to use our code for your own prediction problem, the following information is required:
- G: the topology of your graph as a LightGraph object
- feats: an array of feature vectors
- labels: an array of real-valued outcomes
All prediction algorithms considered in our paper are implement as one single function run_dataset with different options (e.g. the smoothing level α, nonlinear activation σ, number of GCN layers K). Each algorithm can be decomposed into three steps: 1) feature pre-processing, 2) inductive prediction, 3) residual propagation. The options corresponding to the algorithms consider in our paper can be summarized as follows:
step 1 | step 2 | step 3 | |||||
Options | feature_smoothing | α | predictor | σ | K | residual_propagation | α |
LP | false | mean | true | [0,1] | |||
LR | false | mlp | identity | false | |||
LGC | true | [0,1] | mlp | identity | false | ||
SGC | false | gcn | identity | {1,2,...} | false | ||
GCN | false | gcn | relu | {1,2,...} | false | ||
LGC/RP | true | [0,1] | mlp | identity | true | [0,1] | |
SGC/RP | false | gcn | identity | {1,2,...} | true | [0,1] | |
GCN/RP | false | gcn | relu | {1,2,...} | true | [0,1] |
For example, to run LGC/RP on the 2016 U.S. dataset to predict election outcomes, you can simply do the following:
# read the four requirements as listed above
G, _, labels, feats = read_network("county_facebook_2016_election");
# random split, 30% training and 70% testing
ll, uu = rand_split(nv(G), [0.3, 0.7]);
# prediction routine
run_dataset(G, feats, labels, ll, uu;
feature_smoothing=true, α=0.80, predictor="mlp", σ=identity, residual_propagation=true)
Please see example.jl for more details.
In order to fit a dataset to our proposed Gaussian MRF, one can simply run the following:
# fit dataset to a Gaussian MRF
fit_gmrf("county_facebook_2016_election")
This function would call read_network internally to collect the graph topology and node attributes. It will print out all the Gaussian MRF parameters after finish.
To use our code for your data, you simply need to add your data to the loader read_network. One thing you probably want to do is to normalize each node feature (as well as the outcome) to have zero mean. For a simple example of how to write the data loader, see read_ward.
The experiments in our paper can be reproduced by running.
bash run.sh
which would write the outputs to /results. This potentially takes a long time due to hyperparameter scanning.
If you have any questions, please email to jj585@cornell.edu.