/link-prediction

Representation learning for link prediction within social networks

Primary LanguageJupyter NotebookMIT LicenseMIT

Link Prediction Experiments

This repository contains a series of machine learning experiments for link prediction within social networks.

We first implement and apply a variety of link prediction methods to each of the ego networks contained within the SNAP Facebook dataset and SNAP Twitter dataset, as well as to various random networks generated using networkx, and then calculate and compare the ROC AUC, Average Precision, and runtime of each method.

If you use this repository in your work, please cite the corresponding DOI:

DOI

Link Prediction Methods Tested:

Requirements

Pre-Use Installation

python setup.py install

Included Files

Network Data

  • facebook/: Original Facebook ego networks dataset, with added .allfeats files (with both ego and alter features)
  • fb-processed/: Pickle dumps of (adjacency_matrix, feature_matrix) tuples for each ego network, and for combined network
  • twitter/: Twitter ego networks dataset (combined), with adjacency matrix pickle dump
  • visualizations/: Visualizations of each network generated by networkx and matplotlib
  • network-statistics/: .txt and .pkl files of pre-calculated network characteristics (with info on connectivity, network size, etc.) for each network
  • train-test-splits/: Pickle dumps of pre-processed train-test splits for Facebook ego networks, with varying degrees of visibility (i.e. how many edges are hidden). Includes: adj_train, train_edges, train_edges_false, val_edges, val_edges_false, test_edges, test_edges_false
  • process-ego-networks.py: Script used to process raw Facebook data and generate pickle dumps
  • process-combined-network.py: Script used to combine Facebook ego networks and generate complete network pickle dump
  • process-twitter-network.py: Script used to process raw Twitter data and generate pickle dumps
  • fb-train-test-splits.py: Script used to generate and store train-test splits for each Facebook ego network
  • twitter-train-test-splits.py: Script used to generate and store train-test-splits for Twitter combined network

Annotated Link Prediction IPython Notebooks

  • link-prediction-baselines.ipynb: Adamic-Adar, Jaccard Coefficient, Preferential Attachment
  • spectral-clustering.ipynb: Using spectral embeddings for link prediction
  • node2vec.ipynb: Skip-gram based representation learning for node/edge embeddings
  • graph-vae.ipynb: (Variational) Graph Autoencoder, learns node embeddings to recreate adjacency matrix

Link Prediction Helper Scripts

  • link_prediction_scores.py: Utility functions for running various link prediction tests

Exploratory Analysis

  • network-visualizations.ipynb: Generate .pdf visualizations for each network, in addition to calculating and storing a variety of network metrics (e.g. transtivity, avg. clustering coefficient, etc.)

Full Link Prediction Experiments

  • nx-graph-experiments.ipynb: Run all link prediction tests on various types of random networks (Erdos-Renyi, Barabasi-Albert, etc.)
  • fb-graph-experiments.ipynb: Run all link prediction tests on each Facebook ego network
  • run-all-experiments.py: Run all link prediction experiments (on both Facebook networks and random networkx networks), save results as pickle dumps in results
  • run-twitter-experiments.py: Run all link prediction experiments on combined (directed) Twitter network. Warning: may take a while to run.

Results

  • results/: Pickle dumps of experiment results, with results (ROC AUC, ROC Curve, Avg. Precision, Runtime) stored for each link prediction method in Python dictionary form
  • investigate-results.ipynb: Generate and save bar plots/ROC curve plots for each method and graph type, save network characteristics to .txt files
  • result-plots-by-graph/: Bar plots for the results of each experiment (ROC AUC, AP, Minimum Runtime) organized by network, in .pdf form
  • result-plots-by-algorithm/: Bar plots for the AUROC results of each experiment organized by link prediction algorithm, in .pdf form