This is a graph computing implementation of network embedding algorithm DeepWalk and is based on GraphLite. In a short word, it uses GraphLite to compute the random walk sequence.
- JDK (> 1.7)
- Hadoop (> 2.6.0)
- protocol buffers
- GraphLite-0.20
cd GraphLite/GraphLite-0.20/example/
Then please modify Makefile
:
EXAMPLE_ALGOS=PageRankVertex
to:
EXAMPLE_ALGOS=multi_walk
make
Before running, you should decide what the parallelism (i.e. the number of workers) is and run the command below to partition the input.
hash-partitioner.pl <input_path> <parallelism>
Then use graphlite to generate the results.
start-graphlite multi_walk.so <partitioned_input_path> <output_path>
Actually, we have simplified the procedure by using scripts, which you need not to build and partition by yourself.
The things you only need to do is to modify the Makefile and run the command below.
make_and_run.sh <input_path> <output_path>
test.sh
Notice: all results would be saved as tmp.dat
.
DeepWalk: Online Learning of Social Representations
GraphLite: A lightweight graph computation platform in C/C++