/hadoop-pagerank

PageRank algorithm implementation which make use of the Apache Hadoop framework

Primary LanguageJava

Hadoop PageRank

PageRank algorithm implementation which make use of the Apache Hadoop framework.

Execute the program

  • Install Hadoop on your machine [OSX], [Linux]
  • Pick a dataset from the Stanford web graphs collection
  • Place the dataset in your Hadoop FS
  • Create the directory which will contain the output
  • Build a JAR using this source code and name it pagerank.jar
  • Launch the software using Hadoop: hadoop jar pagerank.jar --input <in> --output <out>
  • Browse the PageRank output result which can be found in the Hadoop FS

Usage reference

  • --help (-h): display the help text
  • --damping (-d) : the damping factor [OPTIONAL] [DEFAULT = 0.85]
  • --count (-c) : the amount of iterations [OPTIONAL] [DEFAULT = 2]
  • --input (-i) : the directory of the input graph [REQUIRED]
  • --output (-o) : the directory of the output result [REQUIRED]