Link Prediction Experiments
This repository contains a series of machine learning experiments for link prediction within social networks.
We first implement and apply a variety of link prediction methods to each of the ego networks contained within the SNAP Facebook dataset and SNAP Twitter dataset, as well as to various random networks generated using networkx, and then calculate and compare the ROC AUC, Average Precision, and runtime of each method.
If you use this repository in your work, please cite the corresponding DOI:
Link Prediction Methods Tested:
- (Variational) Graph Auto-Encoders: An end-to-end trainable convolutional neural network model for unsupervised learning on graphs
- Node2Vec/DeepWalk: A skip-gram based approach to learning node embeddings from random walks within a given graph
- Spectral Clustering: Using spectral embeddings to create node representations from an adjacency matrix
- Baseline Indexes: Adamic-Adar, Jaccard Coefficient, Preferential Attachment
Requirements
- python 2.7
- TensorFlow (1.0 or later)
- networkx
- gensim
- scikit-learn
- scipy
- jupyter notebook
- pandas
Pre-Use Installation
python setup.py install
Included Files
Network Data
facebook/
: Original Facebook ego networks dataset, with added .allfeats files (with both ego and alter features)fb-processed/
: Pickle dumps of (adjacency_matrix, feature_matrix) tuples for each ego network, and for combined networktwitter/
: Twitter ego networks dataset (combined), with adjacency matrix pickle dumpvisualizations/
: Visualizations of each network generated by networkx and matplotlibnetwork-statistics/
: .txt and .pkl files of pre-calculated network characteristics (with info on connectivity, network size, etc.) for each networktrain-test-splits/
: Pickle dumps of pre-processed train-test splits for Facebook ego networks, with varying degrees of visibility (i.e. how many edges are hidden). Includes:adj_train
,train_edges
,train_edges_false
,val_edges
,val_edges_false
,test_edges
,test_edges_false
process-ego-networks.py
: Script used to process raw Facebook data and generate pickle dumpsprocess-combined-network.py
: Script used to combine Facebook ego networks and generate complete network pickle dumpprocess-twitter-network.py
: Script used to process raw Twitter data and generate pickle dumpsfb-train-test-splits.py
: Script used to generate and store train-test splits for each Facebook ego networktwitter-train-test-splits.py
: Script used to generate and store train-test-splits for Twitter combined network
Annotated Link Prediction IPython Notebooks
link-prediction-baselines.ipynb
: Adamic-Adar, Jaccard Coefficient, Preferential Attachmentspectral-clustering.ipynb
: Using spectral embeddings for link predictionnode2vec.ipynb
: Skip-gram based representation learning for node/edge embeddingsgraph-vae.ipynb
: (Variational) Graph Autoencoder, learns node embeddings to recreate adjacency matrix
Link Prediction Helper Scripts
link_prediction_scores.py
: Utility functions for running various link prediction tests
Exploratory Analysis
network-visualizations.ipynb
: Generate .pdf visualizations for each network, in addition to calculating and storing a variety of network metrics (e.g. transtivity, avg. clustering coefficient, etc.)
Full Link Prediction Experiments
nx-graph-experiments.ipynb
: Run all link prediction tests on various types of random networks (Erdos-Renyi, Barabasi-Albert, etc.)fb-graph-experiments.ipynb
: Run all link prediction tests on each Facebook ego networkrun-all-experiments.py
: Run all link prediction experiments (on both Facebook networks and random networkx networks), save results as pickle dumps inresults
run-twitter-experiments.py
: Run all link prediction experiments on combined (directed) Twitter network. Warning: may take a while to run.
Results
results/
: Pickle dumps of experiment results, with results (ROC AUC, ROC Curve, Avg. Precision, Runtime) stored for each link prediction method in Python dictionary forminvestigate-results.ipynb
: Generate and save bar plots/ROC curve plots for each method and graph type, save network characteristics to.txt
filesresult-plots-by-graph/
: Bar plots for the results of each experiment (ROC AUC, AP, Minimum Runtime) organized by network, in .pdf formresult-plots-by-algorithm/
: Bar plots for the AUROC results of each experiment organized by link prediction algorithm, in .pdf form