/JUST

A tool for constructing and executing experiment pipelines on a cluster

Primary LanguagePython

(WORK IN PROGRESS)

"just" facilitates the construction and management of bash based pipelines on a cluster by

  • Grouping commands under a task name/id
  • Sharing global variables among tasks (for example paths to common programs)
  • Easily scheduling tasks as jobs on a cluster (supports job dependency for tasks with consecutive ids)

Reasons to use "just":

  • Modularity and reusable code: shorter debug cycles
  • Reproducibility: don't struggle with your own scripts 3 months from now.
  • qsub logging
    • STDOUT/STDERR are logged into files with meaningful names, indicating the task they belong to.
    • qsub logs are synced to the master node

Usage:

  • Define a sequence of indexed tasks in a file (here named 'tasks.just'):
0:shared_commands:{{
  # anything written here is shared by all tasks at execution time.
  A=1
}}

1:write:{{
  echo $A >> $workdir/1.txt 
  # the variable $A is known here since it is defined in task 0
  # $workdir should be defined by the user at the command line
}}

2:read:{{
  cat $workdir/1.txt
}}
  • Execute on current machine: just.py tasks.just -s 1-2 --workdir test_just
  • Schedule on a cluster: just.py tasks.just -s 1-2 --workdir test_just --q $QUEUE_NAME (e.g. -q '*@@nlp' on ND's CRC)