/HITS_Algorithm

Different implementations and comparisons of HITS (Hubs and Authorities) Algorithm in Pig and Spark, using Hive

Primary LanguagePython

HITS_Algorithm

Implementations of the Hubs and Authorities Algorithm (HITS) in Apache Spark and Pig.

Data

Data used is page links of wikipedia pages. Source and description is in the link below:

http://haselgrove.id.au/wikipedia.htm

Overview

Hive directory contains code for reading data into a hive tables and transforming tables into edge list

Pig_Implementation directory contains code for implementing algorithm in Apache Pig

Spark_Implementation directory contains code for implementing algorithm in Apache Spark.