A MapReduce framework with a streamlined API from first principles, running on AWS.
For the purpose of this example, we will assume that you have cloned this repository in your home folder. (~/slim-map-reduce)
Navigate to slim-map-reduce
folder and run the Bash script build.sh
. slim-map-reduce.jar
will be available in dist
directory.
Note: Your machine should have Java 8, Maven and awscli already installed. Make sure to set the default output type of awscli as JSON.
Navigate to slim-map-reduce folder and in the binsh
directory, copy smr.config.template
to smr.config
. Provide the key-pair name, security group and the absolute path to slim-map-reduce.jar
in smr.config
.
Copy slim-map-reduce.jar
from dist
folder in to your project folder and add it to your classpath. You can find example implementations under src/main/java/com/examples
directory.
Make sure that you have followed the previous steps before deploying in AWS cluster. Let us assume that you are working in a project folder named WordCount
and you have added slim-map-reduce.jar
in the classpath of your project to use the MapReduce API. To start an AWS cluster with 1 master and 4 slaves, give the following command from your project folder
~/slim-map-reduce/binsh/automate.sh start 4
Note that in the above command, you have to give a relative or absolute path to automate.sh script. Optionally, export the location of slim-map-reduce/binsh/automate.sh
in your PATH environment variable, so that you can reference it directly from any folder.
The above command will start 1 master and 4 slave EC2 instances which are waiting for a user program. Package your project (WordCount) which uses Slim Map Reduce API as a JAR and deploy it as follows
~/slim-map-reduce/binsh/automate.sh deploy WordCount.jar s3n://mybucket/input s3n://mybucket/output
Here we have assumed that your input and output folder paths are passed as command line arguments. Wait for the program to finish and check the output bucket for _SUCCESS
file.
Use the following command
~/slim-map-reduce/binsh/automate.sh stop