/emr-oozie-sample

AWS Elastic Map Reduce with Oozie workflow system installed as bootstrap actions

Primary LanguageShell

EMR-OOZIE-SAMPLE
----------------

simple example of Elastic Map Reduce bootstrap actions for configuring apache oozie.

To use
======

see the makefile.  pretty straightforward, there is a boostrap action bash
script that will install and configure the oozie software on the emr cluster.  
the script is in the config subdirectory, and the makefile shows how to use it
when launching a emr cluster.

Prerequisites
=============

you need the following:
  - make
  - emr commandline tools (seem below for html link)
  - s3cmd (see below)


to run
======

% make create

will create a new emr cluster the oozie installed using the bootstrap action
in config/config-oozie.sh

% make destroy

will terminate the emr cluster 

% make ssh

this will ssh into the head node of the cluster.

to use oozie
============

the oozie software is installed in /opt/oozie...

the oozie-server, a web-based console, is running on port 11000.  once you create the cluster, you can set up a
ssh tunnel (using % make sshproxy) then open a web browser to http://localhost:11000.  

% make sshproxy

(runs the ssh command that sets a local tunnel)

% make socksproxy

(runns an ssh command with a socks proxy.  then configure your local browser to use a v4 socks proxy as localhost port 8888)


To submit a job, you can use the commandline oozie tool in /opt/oozie.../bin/ozzie

for example (once logged into the head node)::

# change user to user oozie
% sudo bash
% su - oozie

# untar the examples and copy them to hdfs
% cd /opt/oozie-3.1.3-incubating/
% tar -zxvf oozie-examples.tar.gz
% . /home/hadoop/.bashrc
% hadoop fs -mkdir /user/oozie/
% hadoop fs -put examples /user/oozie/

# edit the job.properties file for sample application
% vi examples/apps/map-reduce/job.properties

Here is what I used, for name node, look in /home/hadoop/conf/core-site.xml, 
and for the jobTracker look in /home/hadoop/conf/mapred-site.xml

nameNode=hdfs://10.245.53.236:9000
jobTracker=10.245.53.236:9001
queueName=default
examplesRoot=examples


# now run the job
% ./bin/oozie job  -oozie http://localhost:11000/oozie -config
./examples/apps/map-reduce/job.properties -run
job: 0000001-120927170547709-oozie-oozi-W

Now you can use the oozie web console, or the hadoop job tracker console to track the job.

from the commandline:
oozie@ip-10-245-53-236:/opt/oozie-3.1.3-incubating$ hadoop fs -get /user/oozie/examples/output-data/map-reduce/part-00000 .
oozie@ip-10-245-53-236:/opt/oozie-3.1.3-incubating$ more part-00000 
0	To be or not to be, that is the question;
42	Whether 'tis nobler in the mind to suffer
84	The slings and arrows of outrageous fortune,
129	Or to take arms against a sea of troubles,
172	And by opposing, end them. To die, to sleep;
217	No more; and by a sleep to say we end
...





External Links that are useful:
===============================

http://jayatiatblogs.blogspot.com/2011/05/oozie-installation.html - troubleshooting/tips/tricks with oozie installation

http://incubator.apache.org/oozie/docs/3.1.3/docs/DG_Examples.html - example oozie runs

http://docs.amazonwebservices.com/ElasticMapReduce/latest/DeveloperGuide/Bootstrap.html#PredefinedBootstrapActions_RunIf 
docs for bootstrap actions

http://incubator.apache.org/oozie/docs/3.1.3/docs/DG_QuickStart.html  - oozie documentation for install

http://aws.amazon.com/developertools/2264  - emr commandline tools


http://s3tools.org/download - s3cmd commandline tools