Hadoop Playground

Using Hadoop to get distributed, parallel make

Getting started

To get started, get Hadoop 1.2.1 tar.gz from one of the available mirrors.

Whilst downloading, take a look at some docs

Hadoop principles and tutorial

https://docs.marklogic.com/guide/mapreduce/hadoop

Hadoop & Eclipse

Hadoop dev mode

Mini cluster:

setting up mini cluster http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/CLIMiniCluster.html

General:

setting up the environment http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/SingleCluster.html
intro with screenshots http://hortonworks.com/hadoop-tutorial/hello-world-an-introduction-to-hadoop-hcatalog-hive-and-pig/
Apache map-reduce tutorial https://hadoop.apache.org/docs/r1.2.1/mapred_tutorial.html
very old, but clears few things out http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/

Streaming API:

http://www.michael-noll.com/tutorials/writing-an-hadoop-mapreduce-program-in-python/

Compiling the example

In the folder "make" there is a maven/eclipse project, using Hadoop 1.2.1. You can import it with eclipse. It's the word count example. You will need Hadoop 1.2.1 up and running.

To compile, just use maven:

mvn clean compile jar:jar

Let's test it:

# ./make/launch_echos.sh
hadoop fs -rmr make-echos # remove in case it was ther before
hadoop fs -copyFromLocal ../Makefiles/echos/ make-echos # copy the example Makefile
mvn clean install jar:jar # recompile the jar

# jar usage : {folder in HDFS with the Makefile} {goal to build} {name of the Makefile (=Makefile)}
hadoop jar make-0.0.1-SNAPSHOT.jar hadoop_playground.make.Make make-echos all.txt 
	
# verify that the output is in the HDFS
echo "OUTPUT:"
hadoop fs -cat make-echos/all.txt

seeker89/hadoop-playground

Hadoop Playground

Getting started

Compiling the example

Using Hadoop for the distributed make