/multiwalk

a graph computing implement of network embedding algorithm, which uses GraphLite to compute the random walk sequence

Primary LanguageC++MIT LicenseMIT

MultiWalk

This is a graph computing implementation of network embedding algorithm DeepWalk and is based on GraphLite. In a short word, it uses GraphLite to compute the random walk sequence.

Prerequisites

  1. JDK (> 1.7)
  2. Hadoop (> 2.6.0)
  3. protocol buffers
  4. GraphLite-0.20

Build

cd GraphLite/GraphLite-0.20/example/  

Then please modify Makefile:

    EXAMPLE_ALGOS=PageRankVertex

to:

    EXAMPLE_ALGOS=multi_walk  
make

Run

Before running, you should decide what the parallelism (i.e. the number of workers) is and run the command below to partition the input.

hash-partitioner.pl <input_path> <parallelism>  

Then use graphlite to generate the results.

start-graphlite multi_walk.so <partitioned_input_path> <output_path>

Quick Start (recommended)

Actually, we have simplified the procedure by using scripts, which you need not to build and partition by yourself.
The things you only need to do is to modify the Makefile and run the command below.

test all parallelism (1, 2, 4, 8, 16)

make_and_run.sh <input_path> <output_path> 

simply test given datasets

test.sh

Notice: all results would be saved as tmp.dat.

References

DeepWalk: Online Learning of Social Representations

GraphLite: A lightweight graph computation platform in C/C++