In the mapper function we first tokenize entire data and then find first occurrence of βText=ββ which signifies the beginning of the comment and then count number of words in the comment until βββ is found which signifies end of comment.
Length of each comment is sent to reducer with one single standard key β βkeyβ. Reducer sums each value and counts number of values which depicts total number of comments. The sum is divided by number of comments which gives us the average which is sent back to main main and displayed.
Code can be found in the .java files, while complete .jar file is also available
β Find below screenshot of testrun:
- INPUT - In the picture shown below, 11 rows were given as input so the average length given by Hadoop MapReduce could be manually checked
- OUTPUT - As we can see total number of words in each comment is divided by total number of comments, giving us the answer 33.