mapred_engine

This is an implementation of single / multi-core mapreduce.

The multicore implementation calculates the length of the file, since the multiprocess module does not support pickling of generators we create a set of range objects which represent line indices and passes them to the subprocess. Each subprocess iterates lazily through the files and only extracts the wanted lines.

Steps:

git checkout this repo
install venv with:

python3 -m venv env
activate the environment

source env/bin/activate
run setup.py

pip install -e .
Run the scripts, note the -m flag for multicore (default False)

python word_counter.py data/raw/if-kipling.txt -m True

python average_ratings.py data/raw/ratings.txt -m True

bjmask/mapred_engine

mapred_engine