How to start a hadoop ec2 cluster using crane. Workflow: -def conf map using conf fn -def new Jec2 class, stores creds -launch hadoop cluster Configuration maps: creds.clj {:key "AWS-KEY" :secretkey "AWS-SECRET-KEY" :private-key-path "/path/to/private-key" :key-name "key-name"} conf.clj {:image "ami-" :instance-type :m1.large :group "ec2-group" :instances int :instances int :creds "/path/to/creds.clj-dir" :push ["/source/path" "/dest/path/file" "/another/source/path" "/another/dest/path/file"] :hadooppath "/path/to/hadoop" ;;used for remote pathing :hadoopuser "hadoop-user" ;;remote hadoop user :mapredsite "/path/to/local/mapred" ;;used as local templates :coresite "/path/to/local/core-site" :hdfssite "/path/to/local/hdfssite"} Example: ;;read in config "aws.clj" crane.cluster> (def my-conf (conf "/path/to/conf.clj-dir")) ;;create new Jec2 class crane.cluster> (def ec (ec2 (creds "/path/to/creds.clj-dir"))) ;;start cluster (launch-hadoop-cluster ec my-conf) To build: 1. Download and install leiningen http://github.com/technomancy/leiningen 2. Setup jclouds maven snapshot at http://jclouds.rimuhosting.com/maven2/snapshots 3. $ lein deps 4. $ lein install 5. $ java -cp `ls lib/*|xargs echo|sed 's/ /:/g'`:classes jline.ConsoleRunner clojure.main core libs for compute cloud, blobstore, and ssh. supported compute clouds: ec2, rackspace, rimuhosting, vcloud, terremark, hosting.com supported blobstores: s3, rackspace, azure, atmos online build upon the core is remote repl and cluster capabilities. in a couple cases, we use an erlang convention of the ! syntax representing "reomote" (on another process) execution. the first argument to ! in erlang is the pid, wheras in our api, it is an open ssh session or socket. there is no ! operator or fn, we just use the ! suffix convention to communicate remote execution. remote shell execution. (sh! session "tar -xzf repl.tar.gz") remote repl evaluation. eval! (eval! socket (execute (workflow some-cascading-workflow))) TODO: the persistent shell session using jsch ChannelShell is shaky at best, although exec works fine. eval! may be simpler if we could use LineNumberingPushbackReader see comment in remote_repl.clj stuff to poll and pull info from hadoop tracker url into repl