danielepantaleone/hadoop-pagerank

PageRank algorithm implementation which make use of the Apache Hadoop framework

Java

Hadoop PageRank

PageRank algorithm implementation which make use of the Apache Hadoop framework.

Execute the program

Install Hadoop on your machine [OSX], [Linux]
Pick a dataset from the Stanford web graphs collection
Place the dataset in your Hadoop FS
Create the directory which will contain the output
Build a JAR using this source code and name it pagerank.jar
Launch the software using Hadoop: hadoop jar pagerank.jar --input <in> --output <out>
Browse the PageRank output result which can be found in the Hadoop FS

Usage reference

--help (-h): display the help text
--damping (-d) : the damping factor [OPTIONAL] [DEFAULT = 0.85]
--count (-c) : the amount of iterations [OPTIONAL] [DEFAULT = 2]
--input (-i) : the directory of the input graph [REQUIRED]
--output (-o) : the directory of the output result [REQUIRED]