PageRank algorithm implementation which make use of the Apache Hadoop framework.
- Install Hadoop on your machine [OSX], [Linux]
- Pick a dataset from the Stanford web graphs collection
- Place the dataset in your Hadoop FS
- Create the directory which will contain the output
- Build a JAR using this source code and name it pagerank.jar
- Launch the software using Hadoop:
hadoop jar pagerank.jar --input <in> --output <out>
- Browse the PageRank output result which can be found in the Hadoop FS
- --help (-h): display the help text
- --damping (-d) : the damping factor [OPTIONAL] [DEFAULT = 0.85]
- --count (-c) : the amount of iterations [OPTIONAL] [DEFAULT = 2]
- --input (-i) : the directory of the input graph [REQUIRED]
- --output (-o) : the directory of the output result [REQUIRED]