/node2vec-surprises

Code for "On the Surprising Behaviour of node2vec"

Primary LanguageJupyter NotebookBSD 3-Clause "New" or "Revised" LicenseBSD-3-Clause

On the Surprising Behaviour of node2vec

GitHub contributors GitHub

This is the code to our paper on the [ICML 2022 Workshop on Topology, Algebra, and Geometry in Machine Learning]:

@unpublished{Hacker22a,
  title         = {On the Surprising Behaviour of \texttt{node2vec}},
  author        = {Celia Hacker and Bastian Rieck},
  year          = {2022},
  eprint        = {2206.08252},
  archivePrefix = {arXiv},
  primaryClass  = {cs.LG},
  type          = {Preprint},
  repository    = {https://github.com/aidos-lab/node2vec-surprises},
}

Installation

We suggest using poetry to install the code:

$ poetry install

We have tested this repository with Python 3.9 and Python 3.8.1.

Examples

Creating some embeddings

Running the code (example):

# Let's run for 50 epochs, generate 16-dimensional embeddings,
# and keep everything at default parameters.
$ python node2vec.py -d 16 -e 50

Creating a gallery of embeddings

# You can select different point clouds to visualise here. The
# gallery script is sufficiently smart to enlarge its grid.
$ python gallery.py ../results/lm/*-d64*.tsv

Analysing distributions

# The script is smart enough to check whether pairwise distances
# can be calculated and compared here.
$ python analyse_results.py ../results/lm/*.tsv  --hue dimension --function mean_distance
$ python analyse_results.py ../results/lm/*.tsv  --hue dimension --function hausdorff
$ python analyse_results.py ../results/lm/*.tsv  --hue dimension --function wasserstein

Alternatively, we can also visualise kernel density estimates of intra-group and inter-group distances:

$ python analyse_results.py ../results/lm/*-d64*.tsv  --hue dimension --function wasserstein -d

Note that this only works for pairwise distance metrics such as the Wasserstein distance.

Quality assessment

To perform a rudimentary quality assessment analysis, you need to provide a set of embeddings as well as an adjacency matrix to the analysis script:

$ python analyse_results.py ../results/lm/*.tsv  --hue dimension --function link_distributions --adjacency ../results/lm/A.txt