CS 4371.501 Introduction to Big Data Management and Analytics
Homework 2
Before running any of the problems, make sure HDFS is set up:
start-all.sh
hdfs dfs -mkdir /hw2
hdfs dfs -mkdir /hw2/input
hdfs dfs -put /path/to/soc-LiveJournal1Adj.txt /path/to/userdata.txt /hw2/input
hdfs dfs -rm -r /hw2/p*
Output for the following problems will be available on the HDFS at /hw2. Sample output can also be found in the sample_output directory of this repo.
Problem 1: Mutual Friends
hadoop jar hw2.jar hw2.MutualFriends /hw2/input/soc-LiveJournal1Adj.txt /hw2/p1
Problem 3: Average of Friends Age
hadoop jar hw2.jar hw2.AvgFriendAge /hw2/input/soc-LiveJournal1Adj.txt /hw2/input/userdata.txt /hw2/p3
Problem 4: Sorting Friends by Age
hadoop jar hw2.jar hw2.FriendSort /hw2/input/soc-LiveJournal1Adj.txt /hw2/input/userdata.txt /hw2/p4