File Structure

The implementation is completely modularized and is distributed across various files
mapreduce.py contains the implementation of master
mapper.py contains the implementation of mapper
reducer.py contains the implemetation of reducer
key_value_store.py contains the datastore implementation
comm_client.py contais the implementation for communication client
skeleton_map_funcntion.py contains skeleton code for map function python file
skeleton_reduce_funcntion.py contains skeleton code for reduce function python file

Design and flow

Master takes in input file, number of mappers and reducers, the python files for map and reduce function
Master, starts the key value store
Master reads in the input and distributes it for mappers, this is stored in datastore in mapper_1: list(inputs for mapper_1) format
Master spawns all the mappers and passes map function file, key to read from datastore for input, number of reducers, port and ip of data store as command line argument and master polls and waits for all the mappers to finish
Mapper function reads the input assigned, processes it stores in a local file. This file is then read by mapper and all the key value pairs are sent to the datastore along with data assigned for the reducer is sent to the datastore.
Once all the mappers are done processing, master get control again and spwans the reducers with appropriate reducer function. This reducer function gets the which key to read for its assigned input as command line argument along with datastore port and ip. Master waits for all reducers to finish working
Reducer function then gets all the keys it needs to process and prints the output to the console.
Once all the reducers are completed, the master shuts down the datastore

Usage

This implementation usage uses python3 To start the map reduce use the following command

.//mapreduce.py <inputfile> <number_of_mappers> <number_of_reducers> <path to map function file> <path to reduce fuction file> <datastore ip> <datastore port>

Test cases

This is example for inverted index

./mapreduce.py file_to_read.txt 2 10 inverted_index_map_function.py inverted_index_reduce_function.py datastore_ip datastore_port should create output equivalent to text in file test_case2_expected_op.txt
This is word cout example

./mapreduce.py file_to_read.txt 2 10 word_count_map_function.py word_count_reduce_function.py datastore_ip datastore_port should create output equivalent to text in file test_case1_expected_op.txt

How to extend for new map and reduce functions

To use this implementation for custom map reduce functions, just edit _do_the_word method in skeleton_map_function.py and skeleton_reduce_function and everything would work out accordingly.

Assumptions