/sympa

Embedding graphs in symmetric spaces

Primary LanguagePython

sympa: Symmetric Spaces for Graph Embeddings

Code for the papers "Symmetric Spaces for Graph Embeddings: A Finsler-Riemannian Approach" published at ICML 2021, and the paper "Hermitian Symmetric Spaces for Graph Embeddings" published at DiffGeo4DL @ NeurIPS 2020.

Available Models

Vector models:

  • euclidean
  • poincare
  • lorentz
  • sphere
  • prod-hysph: Product of hyperbolic x sphere
  • prod-hyhy: Product of hyperbolic x hyperbolic
  • prod-hyeu: Product of hyperbolic x euclidean

Matrix models:

  • spd: Symmetric positive definite matrices space
  • upper: Upper half space model of the Siegel space
  • bounded: Bounded domain model of the Siegel space
  • dual: Compact dual

The last three allow different metrics

Available metrics

  • riem: Riemannian metric
  • fone: Finsler One
  • finf: Finsler Infinity
  • fmin: Finsler metric of minimum entropy
  • wsum: Learns weights for a weighted sum of the vector-valued distance

Requirements

  • Python == 3.7
  • Pytorch == 1.5.1: conda install pytorch==1.5.1 torchvision==0.6.1 [cpuonly | cudatoolkit=10.2] -c pytorch. In CPU environments, data parallel is not stable with pytorch >= 1.6
  • Geoopt >= 0.3.1: install from repository is advised: pip install git+https://github.com/geoopt/geoopt.git
  • XiTorch: for working with the compact dual only
  • networkx and networkit: for preprocessing only
  • matplotlib: for preprocessing only
  • tensorboardx
  • tqdm

Running experiments

1. Preprocess Data

In all preprocessing cases, the option --run_id=RUN_ID is required. The data will be saved in data/RUN_ID. If --plot_graph is passed, a plot will be generated, but the plotting can take a long time if the graph is large.

Grids:

python preprocess.py --graph=grid --grid_dims=DIMS --nodes=NODES --run_id=RUN_ID 

It will create a grid of DIMS dimensions with int(NODES^(1/DIMS)) nodes.
Ex: python preprocess.py --graph=grid --grid_dims=3 --nodes=27 will create a 3x3x3 cube graph

Trees

python preprocess.py --graph=tree --tree_branching=BRANCHING --tree_height=HEIGHT --run_id=RUN_ID 

It will create a tree with branching factor BRANCHING and height HEIGHT.

Cartesian or Rooted products

By default it will create a cartesian/rooted product of a tree and a grid, but it can be modified by changing the order in the code

python preprocess.py --graph=product-cartesian --grid_dims=DIMS --nodes=NODES --tree_branching=BRANCHING --tree_height=HEIGHT --run_id=RUN_ID 

It will create a cartesian (rooted with --graph=product-rooted) product of the specified tree and grid.

Social Networks

python preprocess.py --graph=TYPE --run_id=RUN_ID 

Current available options are social-karate, social-davis, social-florentine, social-miserables. See NetworkX doc

Expanders

python preprocess.py --graph=TYPE --nodes=NODES --run_id=RUN_ID 

Current available options are expander-margulis, expander-chordal, expander-paley. See NetworkX expanders doc

Custom

python preprocess.py --graph=NAME --run_id=RUN_ID 

It will look for a file in data/NAME/NAME.edges where the graph should be represented as:

src_node1 dst_node1 [weight1]
src_node2 dst_node2 [weight2]
... 

where src_node and dst_node are int values and weight is an optional float value.

2. Train Graph Embeddings

python -m torch.distributed.launch --nproc_per_node=N_CPUS --master_port=2055 train.py \\
            --n_procs=N_CPUS \\
            --data=PREP \\
            --run_id=RUN_ID \\
            --results_file=out/results.csv \\
            --model=MODEL \\
            --metric=riem \\
            --dims=4 \\
            --learning_rate=1e-2 \\
            --val_every=25 \\
            --patience=50 \\
            --max_grad_norm=100 \\
            --batch_size=2048 \\
            --epochs=1000

Experiments can be run distributed over multiple CPUs/GPUs with N_CPUS. PREP must be the name of the graph to embed (what in step 1 was called RUN_ID). Results will be reported in results_file with run_id as the name. For model and metric see Available Models

Considerations

The method inner is implemented for both the Upper Half space and the Bounded domain model. With this, experiments can be run with RiemannianAdam. However, we found them to be very unstable, therefore all experiments reported in the paper were run with RiemannianSGD

TODO

  • Fix broken tests
  • Add Finsler Metrics to SPD manifold
  • Merge branch with Recommender System experiments into master
  • Merge branch with ploting tools into master

Citation

The source code and data in this repository aims at facilitating the study of graph embeddings in symmetric spaces. If you use the code/data, please cite it as follows:

@InProceedings{lopez2021symmetric,
  title =    {Symmetric Spaces for Graph Embeddings: A Finsler-Riemannian Approach},
  author =       {L\'opez, Federico and Pozzetti, Beatrice and Trettel, Steve and Strube, Michael and Wienhard, Anna},
  booktitle =    {Proceedings of the 38th International Conference on Machine Learning},
  pages =    {7090--7101},
  year =     {2021},
  editor =   {Meila, Marina and Zhang, Tong},
  volume =   {139},
  series =   {Proceedings of Machine Learning Research},
  month =    {18--24 Jul},
  publisher =    {PMLR},
  pdf =      {http://proceedings.mlr.press/v139/lopez21a/lopez21a.pdf},
  url =      {http://proceedings.mlr.press/v139/lopez21a.html}
}