Analyse big dataset with hadoop mapreduce
Refer to assignment1_handout.pdf for detailed requirement
##How to run
###requirement
Hadoop 2.6.0
###Steps
- create a hdfs dirctory in your hdfs home named place and upload the place.txt into it
- create another hdfs directory in your hdfs home named photo and upload n01.txt into it
- set A1_HOME environment variable to store the intermidiate output for each jobs
- In the pom.xml directory : mvn package
- cd to task1.sh and task2.sh for each tasks,making sure the scripts stay in the same directory as the MRDriverTask1.class (MRDriverTask2.class)
- pass an integer argument to the task1.sh (or task2.sh) indicating the job start from.for the first time,the argument is always 1