Hadoop_ecosystem

Perform some analysis on data extracted from reddit

How to run

Source utils.sh . source utils.sh
Start HDFS and yarn, you can use start-yarn and start-hdfs from utils.sh, tested on Hadoop 2.9.2
Get nltk depenedanet libraries by running nltk-deps
Put the input file in the input folder and move files to HDFS by running move-files
Start the MapReduce job by running rmapred