/flintstone

Ingition for your spark jobs

Primary LanguageShell

flintstone

Ingition for your spark jobs

If you would like to use the public Janelia Spark deploy command, clone this repository like this:

git clone --recursive https://github.com/saalfeldlab/flintstone

Starting from version 1.9, git-clone you can specify a parameter -j <n> to fetch n submodules in parallel.

If your git version is older than 1.6.5, or you already cloned the repository, run this after cloning:

git submodule update --init --recursive

Starting your Spark jobs on the Janelia cluster is really simple. Simply run

[TERMINATE=1] [RUNTIME=<hh:mm:ss>] [TMPDIR=<tmp>] [LSF_PROJECT=<project>] <flintstone-root>/flintstone.sh <MASTER_JOB_ID|N_NODES> <JAR> <CLASS> <ARGV>

from a QLOGIN environment, where

  • <MASTER_JOB_ID|N_NODES> - job id of master (if already started) or number of worker nodes (otherwise)
  • <JAR> - path to jar containing your class and all dependencies
  • <CLASS> - class that holds spark main function
  • <ARGV> - any arguments passed to your class
  • <TERMINATE> - Shutdown master and worker nodes upon completion.
  • <RUNTIME> - Specify wall time of master node (will default to "default"). Will be ignored if connecting to existing master.
  • <TMPDIR> - Use $TMPDIR for tmp file that is used to start job. Defaults to /tmp
  • <SPARK_DEPLOY_CMD> - command used to deploy spark jobs to janelia lsf, defaults to <flintstone-root>/spark-janelia/spark-janelia-lsf