Using Hadoop to get distributed, parallel make
To get started, get Hadoop 1.2.1 tar.gz from one of the available mirrors.
Whilst downloading, take a look at some docs
Hadoop principles and tutorial
Hadoop & Eclipse
- http://blogs.igalia.com/dpino/2012/10/14/starting-with-hadoop/
- https://github.com/data-tsunami/hello-hadoop
- https://github.com/dpino/Hadoop-Word-Count
- http://www.drdobbs.com/database/hadoop-writing-and-running-your-first-pr/240153197?pgno=1
Hadoop dev mode
- http://blog.tundramonkey.com/2013/02/24/setting-up-hadoop-on-osx-mountain-lion
- http://importantfish.com/how-to-run-hadoop-in-standalone-mode-using-eclipse-on-mac-os-x/
- http://wiki.apache.org/hadoop/WordCount
Mini cluster:
- setting up mini cluster http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/CLIMiniCluster.html
General:
- setting up the environment http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/SingleCluster.html
- intro with screenshots http://hortonworks.com/hadoop-tutorial/hello-world-an-introduction-to-hadoop-hcatalog-hive-and-pig/
- Apache map-reduce tutorial https://hadoop.apache.org/docs/r1.2.1/mapred_tutorial.html
- very old, but clears few things out http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/
Streaming API:
In the folder "make" there is a maven/eclipse project, using Hadoop 1.2.1. You can import it with eclipse. It's the word count example. You will need Hadoop 1.2.1 up and running.
To compile, just use maven:
mvn clean compile jar:jar
Let's test it:
# ./make/launch_echos.sh
hadoop fs -rmr make-echos # remove in case it was ther before
hadoop fs -copyFromLocal ../Makefiles/echos/ make-echos # copy the example Makefile
mvn clean install jar:jar # recompile the jar
# jar usage : {folder in HDFS with the Makefile} {goal to build} {name of the Makefile (=Makefile)}
hadoop jar make-0.0.1-SNAPSHOT.jar hadoop_playground.make.Make make-echos all.txt
# verify that the output is in the HDFS
echo "OUTPUT:"
hadoop fs -cat make-echos/all.txt