metapath2vec with tensorflow

This repo contains an implementation of metapath2vec using tensorflow.

main reference appeared at KDD 2017: metapath2vec:

Please use the author's implementation for formal experiments!!

I just wrote this for myself in order to learn how the algorithm works. I haven't tested on a big network or even checked if I can reproduce the reported performance, so be careful when you use it.... (I mean there might be a bug :) ). The author 's implementation is available here: https://ericdongyx.github.io/metapath2vec/m2v.html.

Requirements

I recommend you to install Anaconda and then tensorflow.

How to use.

See help for the information. It should be self-contained.

python main.py --help

You need to provide two files: a text file that has node type information, and text files that has paths generated by random walks guided by a meta path. See data/test_data to find sample txts. Note that you have to generate meta-path guided random walks by yourself.

How to train.

learn embeddings using the random walks

python main.py --walks ./data/test_data/random_walks.txt --types ./data/test_data/node_type_mapings.txt --log ./log --negative-samples 5 --window 1 --epochs 100 --care-type 0
python main.py --walks ./data/test_data/random_walks.txt --types ./data/test_data/node_type_mapings.txt --log ./log --negative-samples 1 --window 1 --epochs 100 --care-type 1
tensorboard --logdir=./log/

how to load the learned embeddings

import numpy as np
import json
index2nodeid = json.load(open("./log/index2nodeid.json"))
index2nodeid = {int(k):v for k,v in index2nodeid.items()}
nodeid2index = {v:int(k) for k,v in index2nodeid.items()}
node_embeddings = np.load("./log/node_embeddings.npz")['arr_0']

#node embeddings of "yi"
node_embeddings[nodeid2index["yi"]]

To do list

  • Make the batch size more than 1.