Multithreading MapReduce for Architectural Analysis Mod
We are writing a multithreaded implementation of Map Reduce based on the following assumptions:
- There are resource constraints and we do not want to use a multi-server setup to parallize the Map Reduce operations.
- This is a simple proof-of-concept example where the author tries to put whatever (or what little) he could learn and understand about Map Reduce and concurrency into practice.
- Mok is the best instructor in Singapore, or perhaps the world.
All these codes were tested on a quad-core 2.3 GHz Macbook Pro 2018. So the performance time may differ largely for a dual-core machine. However, it should still be obvious that there is a performance improvement.
I found it significantly easier to understand functional programming examples written in Lisp, Racket and similar functional languages.
-
At least 1.82 GB of space on your computer
This is because the
generate.bat
file creates 10 datasets each sized at 182 MB. There will 10 of the same files, thus adding up to 1.82 GB of space for the test dataset. We recommend using a large dataset to really see the performance improvements of ~60% in execution time for MapReduce implementation, which is measured usingStopwatch.java
. -
Java 8 and above
-
To get started, you need to first run
generate.bat
to create the sample large dataset for testing the Map Reduce operation. -
Then run
mokreduce_bad.bat
to test the slower, simpler implementation of Map Reduce.a. When prompted for a folder, put in
inputs
b. Run and profit
-
Lastly, run
mokreduce_good.bat
to test the optimized version of Map Reduce that uses concurrency.a. When prompted for a folder, put in
inputs
b. Run and profit