/hadoop-playground

Hacking Hadoop: using MapReduce to execute a parallel, distributed Makefile on a Hadoop Cluster, aka forcing elephant to do what it doesn't want to

Primary LanguageJavaOtherNOASSERTION

Hadoop Playground

Using Hadoop to get distributed, parallel make

Getting started

To get started, get Hadoop 1.2.1 tar.gz from one of the available mirrors.

Whilst downloading, take a look at some docs

Hadoop principles and tutorial

Hadoop & Eclipse

Hadoop dev mode

Mini cluster:

General:

Streaming API:

Compiling the example

In the folder "make" there is a maven/eclipse project, using Hadoop 1.2.1. You can import it with eclipse. It's the word count example. You will need Hadoop 1.2.1 up and running.

To compile, just use maven:

mvn clean compile jar:jar

Let's test it:

# ./make/launch_echos.sh
hadoop fs -rmr make-echos # remove in case it was ther before
hadoop fs -copyFromLocal ../Makefiles/echos/ make-echos # copy the example Makefile
mvn clean install jar:jar # recompile the jar

# jar usage : {folder in HDFS with the Makefile} {goal to build} {name of the Makefile (=Makefile)}
hadoop jar make-0.0.1-SNAPSHOT.jar hadoop_playground.make.Make make-echos all.txt 
	
# verify that the output is in the HDFS
echo "OUTPUT:"
hadoop fs -cat make-echos/all.txt

Using Hadoop for the distributed make