/hpc-workflows

Porting Alan's workflow lesson in snakemake to maestro

Primary LanguageROtherNOASSERTION

Tame Your Workflow with Snakemake

In HPC Intro, learners explored the scheduler on their cluster by launching a program called amdahl. The objective of this lesson is to adapt the manual job submission process into a repeatable, reusable workflow with minimal human intervention. This is accomplished using Snakemake, a modern workflow engine.

If you are interested in learning more about workflow tools, please visit The Workflows Community.

Snakemake is best for single-node jobs

NERSC's Snakemake docs lists Snakemake's "cluster mode" as a disadvantage, since it submits each "rule" as a separate job, thereby spamming the scheduler with dependent tasks. The main Snakemake process also resides on the login node until all jobs have finished, occupying some resources.

If you wish to adapt your Python-based program for multi-node cluster execution, consider applying the workflow principles learned from this lesson to the Parsl framework. Again, NERSC's Parsl docs provide helpful tips.

Contributing

This is a translation of the old HPC Workflows lesson using The Carpentries Workbench and R Markdown (Rmd). You are cordially invited to contribute! Please check the list of issues if you're unsure where to start.

Building Locally

If you edit the lesson, it is important to verify that the changes are rendered properly in the online version. The best way to do this is to build the lesson locally. You will need an R environment to do this: as described in the {sandpaper} docs, the environment can be either your terminal or RStudio.

Setup

The environment.yml file describes a Conda virtual environment that includes R, Snakemake, amdahl, pandoc, and termplotlib: the tools you'll need to develop and run this lesson, as well as some depencencies. To prepare the environment, install Miniconda following the official instructions. Then open a shell application and create a new environment:

you@yours:~$ cd path/to/local/hpc-workflows
you@yours:hpc-workflows$ conda env create -f environment.yaml

N.B.: the environment will be named "workflows" by default. If you prefer another name, add -n «alternate_name» to the command.

{sandpaper}

{sandpaper} is the engine behind The Carpentries Workbench lesson layout and static website generator. It is an R package, and has not yet been installed. Paraphrasing the installation instructions, start R or radian, then install:

you@yours:hpc-workflows$ R --no-restore --no-save
install.packages(c("sandpaper", "varnish", "pegboard", "tinkr"),
 repos = c("https://carpentries.r-universe.dev/", getOption("repos")))

Now you can render the site! From your R session,

library("sandpaper")
sandpaper::serve()

This should output something like the following:

Output created: hpc-workflows/site/docs/index.html
To stop the server, run servr::daemon_stop(1) or restart your R session
Serving the directory hpc-workflows/site/docs at http://127.0.0.1:4321

Click on the link to http://127.0.0.1:4321 or copy and paste it in your browser. You should see any changes you've made to the lesson on the corresponding page(s). If it looks right, you're set to proceed!