Snakemake is a Pythonic workflow description language, that is easily configurable to run in all sorts of environments. Since version 4.1, Snakemake contains a feature called 'profiles', for easy exchange of configuration presets for running in a certain environment. This repository contains a snakemake profile to run your workflow on the Broad's UGER cluster.
The recommended way to use this Snakemake profile is to create a separate conda
environment for your project. This environment will contain a separate Python
installation specifically for your project, where you control which packages
are installed. In the example below we will create an environment named
snakemake
(with the -n
switch), but you can name it anything you want.
Furthermore, Snakemake requires Python>=3.5, so we install Python 3 along with
two additional packages: Snakemake itself and the package cookiecutter
(used
to install this profile).
use .anaconda3-5.0.1
# Create new conda environment with up to date snakemake
conda create -n snakemake python=3
source activate snakemake
pip install snakemake cookiecutter
# (Optional) You can now install additional dependencies specific to your
# project
conda install numpy scipy ...
NB: Conda creates the environment by default in your home directory. At
Broad, your home directory is limited to 5GB so this may fill up quickly. It's
probably a good idea to store the Conda environment in some other place. This
can be done by replacing -n snakemake
with --prefix /path/where/env/will/be/stored
, and also specify the path to your conda
environment when issuing the source activate
command.
Change to the directory containing your Snakefile
and issue the following
command:
cookiecutter gh:eachanjohnson/snakemake-broad-uger
This command will ask a few questions:
- You can optionally specify a different profile name than the default
(
broad-uger
). - Whether to use the
--immediate-submit
option of Snakemake. Currently not recommended, until this fix is included in a release. - Specify the name (when using
-n
above) or the path (when using--prefix
above) to the conda environment you want to use. - Which dotkits to use in job submission. You'll at least need to specify a conda distribution.
We're ready to go! To use this profile invoke Snakemake as follows:
snakemake --profile broad-uger ...
If you're not using --immediate-submit
, the Snakemake master process must be
alive for the whole duration of your workflow (i.e. until all jobs have
finished). My recommendation would be to start the Snakemake process on one of
the login nodes, in a screen
session. This makes sure the Snakemake master
process doesn't get killed when you lose your SSH connection.
Example:
# Start screen session with snakemake in the background
screen -dmS snakemake snakemake --profile broad-uger ...
# View output:
screen -x snakemake
The Snakemake master process is light weight so it shouldn't be a problem to run this on the login node.
This profile determines the runtime, memory and amount of cores as follows:
- Runtime: specify in your
--cluster-config
file, with keyruntime
- Memory: Specify in your rule under
resources
with keymem_mb
. Can be overridden by specifying a value in your--cluster-config
file. - Cores/CPUs: specify using
threads
per rule. - UGER project: specify in
--cluster-config
file with keyproject
Read more about:
The cluster submission and jobscripts are partly taken/inspired by the corresponding files in the broadinstitute/viral-ngs repository.