This is a python/sklearn implementation of the Iterative Classification Algorithm from:
Qing Lu, Lise Getoor, Link-based classification (ICML 2003)
which served as a semi-supervised classification baseline in our recent paper:
Thomas N. Kipf, Max Welling, Semi-Supervised Classification with Graph Convolutional Networks (2016)
This implementation is largely based on and adapted from: https://github.com/sskhandle/Iterative-Classification
python setup.py install
- sklearn
- networkx
python train.py
In order to use your own data, you have to provide
- an N by N adjacency matrix (N is the number of nodes),
- an N by D feature matrix (D is the number of features per node), and
- a N by E binary label matrix (E is the number of classes).
Have a look at the load_data()
function in utils.py
for an example.
In this example, we load citation network data (Cora, Citeseer or Pubmed). The original datasets can be found here: http://linqs.cs.umd.edu/projects/projects/lbc/. In our version (see data
folder) we use dataset splits provided by https://github.com/kimiyoung/planetoid (Zhilin Yang, William W. Cohen, Ruslan Salakhutdinov, Revisiting Semi-Supervised Learning with Graph Embeddings, ICML 2016).
You can specify a dataset as follows:
python train.py -dataset citeseer
(or by editing train.py
)