/hadoop-mapreduce-average-calculator

Hadoop MapReduce simple average calculator application that counts the average grade for each module in a given input set. The

Primary LanguageJavaMIT LicenseMIT

Hadoop MapReduce average calculator

AverageCalculator is a simple MapReduce application that counts the average grade for each module in a given input set. The application operates in two different stages map phase and reduces phase.

1. Map phase

Mapper takes the file as input, divides it into single line. Where 4th (module) and 5th (grade) columns values are stored as KeyValue pair in the HashMap. But if the key already exists, it adds a new grade value with the previous grade value. Additionally, another HashMap tracks the number of times each module appears in a column. Then a cleanup converts the Hashmap Values into List of <Module, <IntPair(valueGrade, valueCount)>. Here, the key is the Module and the value is the integer pair of Grade, and the total Count of that key.

2. Reduce phase

This is the phase that is responsible to calculates the average grade of each module. The reducer takes a List of <Module, <IntPair(valueGrade, valueCount)> from the Map class. Then it iterates over the Integer pair of values and adds each pair value with the previous value. Finally, the sum of Grades is divided by the total number of counts recorded during the iteration process. Which outputs a final average for each Module.

License

MIT. Copyright (c) MIT License.