/crane

heavy lifting in clojure

Primary LanguageClojure



How to start a hadoop ec2 cluster using crane.

Workflow:
-def conf map using conf fn
-def new Jec2 class, stores creds
-launch hadoop cluster

Configuration maps:

creds.clj

{:key "AWS-KEY"
 :secretkey "AWS-SECRET-KEY"
 :private-key-path "/path/to/private-key"
 :key-name "key-name"}

conf.clj

{:image "ami-"
 :instance-type :m1.large
 :group "ec2-group"
 :instances int
 :instances int
 :creds "/path/to/creds.clj-dir"
 :push ["/source/path" "/dest/path/file"
        "/another/source/path" "/another/dest/path/file"]
 :hadooppath "/path/to/hadoop"          ;;used for remote pathing
 :hadoopuser "hadoop-user"              ;;remote hadoop user
 :mapredsite "/path/to/local/mapred"    ;;used as local templates
 :coresite "/path/to/local/core-site"
 :hdfssite "/path/to/local/hdfssite"}

Example:

;;read in config "aws.clj"
crane.cluster> (def my-conf (conf "/path/to/conf.clj-dir"))

;;create new Jec2 class
crane.cluster> (def ec (ec2 (creds "/path/to/creds.clj-dir")))

;;start cluster 
(launch-hadoop-cluster ec my-conf)

To build:

1. Download and install leiningen http://github.com/technomancy/leiningen
2. Setup jclouds maven snapshot at http://jclouds.rimuhosting.com/maven2/snapshots
3. $ lein deps
4. $ lein install
5. $ java -cp `ls lib/*|xargs echo|sed 's/ /:/g'`:classes jline.ConsoleRunner clojure.main

core libs for compute cloud, blobstore, and ssh.

supported compute clouds: ec2, rackspace, rimuhosting, vcloud, terremark, hosting.com
supported blobstores: s3, rackspace, azure, atmos online

build upon the core is remote repl and cluster capabilities.

in a couple cases, we use an erlang convention of the ! syntax representing "reomote" (on another process) execution.
the first argument to ! in erlang is the pid, wheras in our api, it is an open ssh session or socket.
there is no ! operator or fn, we just use the ! suffix convention to communicate remote execution.

remote shell execution.
(sh! session "tar -xzf repl.tar.gz")

remote repl evaluation.
eval! 
(eval! socket (execute (workflow some-cascading-workflow)))

TODO:
the persistent shell session using jsch ChannelShell is shaky at best, although exec works fine.
eval! may be simpler if we could use LineNumberingPushbackReader see comment in remote_repl.clj
stuff to poll and pull info from hadoop tracker url into repl