/skylab

Soon to be deprecated in favor of broadinstitute/warp github repo. Previously: Secondary analysis pipelines

Primary LanguageWDLBSD 3-Clause "New" or "Revised" LicenseBSD-3-Clause

skylab

Secondary analysis pipelines for the Human Cell Atlas.

GitHub Release License Snyk Vulnerabilities for GitHub Repo (Specific Manifest)

Pipelines in this repository will be migrated to WARP: WDL Analysis Research Pipelines for future maintenance. Old releases and git history will remain here for posterity, but the repository will be archived. Please reach out if you'd like an email before we archive at kdegatano@broadinstitute.org.

Pipelines

How to run pipelines from skylab

For now, use git clone git@github.com:HumanCellAtlas/skylab.git and run the pipeline in Cromwell.

  1. WDL and Cromwell Documentation
  2. Running WDLs in Cromwell

preemptible

Tasks on Cromwell may be run on what are known as "preemptible" machines to reduce costs by a significant amount. The catch with preemptible machines is that they may be "preempted" at any given moment--as in, google may shut down the task to re-use the resources.

Many tasks are set to automatically be preemptible = 3, aka they will be run on preemptible instances for up to 3 instances of preemption, after which it will be run on a non-preemptible machine. This option may be set to 0 by passing a task-level input to the workflow (i.e. Optimus.StarAlign.preemptible), causing the task to be run without using preemptible machines. This option is especially useful for long-running tasks, which have a chance to take a very long time to run due to being preempted multiple times.

maxRetries

Some tasks have a maxRetries runtime attribute specified, with the default value set to zero. You probably shouldn't override the default and even if you do, you should do so with caution.

Setting it to an integer n greater than zero will make Cromwell retry the task up to n times if it fails for any reason. This can be useful when running a task over and over in a production setting where a high proportion of failures are due to transient problems (e.g. VM dies while job is running) that do not persist when the task is rerun. Even in that situation, it is probably best to set maxRetries to no more than 1 or 2, since if you're running in the cloud you will incur additional charges for each retry.

See the Cromwell documentation for more information.