Perform some analysis on data extracted from reddit
- Source
utils.sh
.source utils.sh
- Start HDFS and yarn, you can use
start-yarn
andstart-hdfs
fromutils.sh
, tested on Hadoop 2.9.2 - Get nltk depenedanet libraries by running
nltk-deps
- Put the input file in the input folder and move files to HDFS by running
move-files
- Start the MapReduce job by running
rmapred