/graph-representation-learning

Autoencoders for Link Prediction and Semi-Supervised Node Classification

Primary LanguagePythonMIT LicenseMIT

Local Neighborhood Graph Autoencoders

This is a Keras implementation of the symmetrical autoencoder architecture with parameter sharing for the tasks of link prediction and semi-supervised node classification, as described in the paper:

Tran, Phi Vu: Learning to Make Predictions on Graphs with Autoencoders. Proceedings of the 5th IEEE International Conference on Data Science and Advanced Analytics (2018).

FCN_schematic

Requirements

The code is tested on Ubuntu 16.04 with the following components:

Software

  • Python 2.7
  • Keras 2.0.6 using TensorFlow GPU 1.1.0 backend
  • CUDA 8.0 with CuDNN 5.1
  • NetworkX 1.11
  • NumPy 1.11
  • SciPy 0.17.0
  • Scikit-Learn 0.18.1

Hardware

  • Intel Xeon CPU with 32 cores
  • 64GB of system RAM
  • NVIDIA GeForce GTX TITAN X GPU with 12GB of VRAM

Datasets

Citation networks from Thomas Kipf and Max Welling. 2016. Semi-Supervised Classification with Graph Convolutional Networks:

  • Cora, Citeseer, Pubmed

Collaboration and social networks from Wang et al. 2016. Structural Deep Network Embedding:

  • Arxiv-GRQC, BlogCatalog

Miscellaneous networks from Aditya Krishna Menon and Charles Elkan. 2011. Link Prediction via Matrix Factorization:

  • Protein, Metabolic, Conflict, PowerGrid

For custom graph datasets, the following are required:

  • N x N adjacency matrix (N is the number of nodes) [required for link prediction],
  • N x F matrix of node features (F is the number of features per node) [optional for link prediction],
  • N x C matrix of one-hot label classes (C is the number of classes) [required for node classification].

For an example of how to prepare the input dataset, take a look at the load_citation_data() function in utils_gcn.py.

Usage

For training and evaluation, execute the following bash commands in the same directory where the code resides:

# Set the PYTHONPATH environment variable
$ export PYTHONPATH="/path/to/this/repo:$PYTHONPATH"

# Train the autoencoder model for network reconstruction
# using only latent features learned from local graph topology.
$ python train_reconstruction.py <dataset_str> <gpu_id>

# Train the autoencoder model for link prediction using
# only latent features learned from local graph topology.
$ python train_lp.py <dataset_str> <gpu_id>

# Train the autoencoder model for link prediction using
# both latent graph features and available explicit node features.
$ python train_lp_with_feats.py <dataset_str> <gpu_id>

# Train the autoencoder model for the multi-task
# learning of both link prediction and semi-supervised
# node classification, simultaneously.
$ python train_multitask_lpnc.py <dataset_str> <gpu_id>

The flag <dataset_str> refers to one of the following nine supported dataset strings: protein, metabolic, conflict, powergrid, cora, citeseer, pubmed, arxiv-grqc, blogcatalog. The flag <gpu_id> denotes the GPU device ID, 0 by default if only one GPU is available.

Citation

If you find this work useful, please cite the following:

@inproceedings{Tran-LoNGAE:2018,
  author={Tran, Phi Vu},
  title={Learning to Make Predictions on Graphs with Autoencoders},
  booktitle={5th IEEE International Conference on Data Science and Advanced Analytics},
  year={2018}
}