/clojure-parallel-runs

Clojure job distribution for Rocks clusters using RabbitMQ.

Primary LanguageClojure

This material is based upon work supported by the National Science
Foundation under Grant No. 1017817. Any opinions, findings, and conclusions or
recommendations expressed in this publication are those of the authors and do
not necessarily reflect the views of the National Science Foundation.

To use:

1. a) Make sure a RabbitMQ server is available.  For the example, that server
      will be located at 127.0.0.1:5672 with username "guest" and password "guest".

   b) Install Leiningen, the Clojure dependency management tool.  It is available
      with installation instructions at: https://github.com/technomancy/leiningen

2. Compile the project by running:

        $ lein deps
        $ lein uberjar

3. Write a Clojure file in the following way.  Here the example will
be doing a calculation with the number arguments.

We may write the following, where params is a map of the parameters to
be used for any given run.  An example set of parameters to be used in this
case is {:name "a=3" :a 5 :b 3 :input 5} (where name is used later to
identify a run using this parameter map).  Also notice that #'example.calculation/run
is added at the end.  This is necessary for a worker to know which function
to call in which namespace run.

---calculation.clj----
(ns example.calculation
  (:use clojure.contrib.math))

(defn run [params]
(let [a (params :a)
      b (params :b)
      x (params :x)]
  (* a (expt b x ))))

#'examples.calculation/run
---------------------

4.  We must define the parameters to be distributed.  This is done in clojure.
As an example, to define parameters for 10 runs with a=3 and b=[0 10)
and 10 runs with a=4 and b=[0 10), put the following clojure code into a
file (here saved as parameters.clj).

---parameters.clj---
(flatten (vector
  (for [i (range 10)]
    {:name "a=3"
     :a 3
     :b i
     :input 10})
  (for [i (range 10)]
    {:name "a=4"
     :a 4
     :b i
     :input 10})))
-------------------

The only requirement is that the code should produce a seq/vec/list of parameter
maps, and those parameter maps should include :name if a log file is to be saved.

5. Distribute the paramaters.

     $ java -jar clojure-parallel-runs.jar \
          --distribute --parameters /home/user/parameters.clj \
          --queue-name my_run \
          --host 127.0.0.1 --port 5672 --user guest --pass guest

   Where host, port, user, and pass pertain to the RabbitMQ server.

6. Start the workers.

    a) Ensure that you have passwordless ssh access to each host in the hosts file.

    b) Make a hosts file of the form:

       your_username_on_host1 host_or_ip_address1
       your_username_on_host2 host_or_ip_address2

       a more concrete example:

       ---hosts---
       btm08 compute-1-1
       btm08 compute-1-2
       brian 127.0.0.1
       -----------

    c) run the command as follows:

       $ ./start-workers.sh "java -jar clojure-parallel-runs.jar \
                                --worker \
                                --file /home/user/calculation.clj \
                                --queue-name my_run \
                                --host 127.0.0.1
                                --port 5672
                                --user guest
                                --pass guest" < hosts

       Where host, port, user, and pass pertain to the RabbitMQ server, and
       queue-name matches the one with which parameters where distributed.

    d) The workers will process the parameters, and the return values will accumulate.
       Workers will exit when there are no more parameters to be retrieved.


7. The output of each run will written to the specified log directory on each
worker.  The log file name will be the value of :name given in the parameter
map followed by the current time in milliseconds.

To retrieve the return values and write them to an output-file run:

     $ java -jar clojure-parallel-runs.jar \
          --output-file /home/user/output-file
          --queue-name my_run \
          --host 127.0.0.1 --port 5672 --user guest --pass guest

In this example for the parameters.clj used above, /home/user/output-file will
contain:

---output-file---
...
{:name "a=3", :paramaters-used {:name "a=3", :a 3, :b 3, :input 10}, :log-file a=3_1289937662263, :return-value 177147
{:name "a=3", :paramaters-used {:name "a=3", :a 3, :b 4, :input 10}, :log-file a=3_1289937685082, :return-value 3145728}
{:name "a=3", :paramaters-used {:name "a=3", :a 3, :b 5, :input 10}, :log-file a=3_1289937716053, :return-value 29296875}
...
-----------------