/seqbotslurm

A SLURM wrapper around the Biohub seqbot download script

Primary LanguageShellMIT LicenseMIT

The Chan Zuckerburg Biohub provides an amazing service to affiliated institutions: offering to run their sequencing jobs! You provide a library, and they provide a runfolder. But now you have to get the runfolder on to your computer.

With earlier generations of sequencing, runfolder sized were measured in tens of Gigabytes. But modern sequencing platforms are delivering runfolders measured in the Terabytes. Successfully downloading so much data requires both the disk space, and stable platform (for example, not a laptop), and bandwidth. You may already have this, in the form of a compute environment or cluster!

This repository contains a script, s3sync.sh. The script is designed to run in a compute environment that uses SLURM for job scheduling.

  • The script has minimal requirements: It is written in Bash, and only requires the AWS CLI.

  • The script is fairly smart: Instead of having to parse the seqbot .sh file yourself, you just copy-paste the entire thing when the script asks for it. The script will read the seqbot .sh file, and extract the information needed to do the download.

  • The script is automatic: It initially asks for a four-hour runtime from SLURM. If that is not enough time, it will re-submit itself. Once the job is submitted, you only need to get involved when something breaks.

The script was originally written for use in the Stanford Research Computing Center's Sherlock computing environment, but it should work in other SLURM-based clusters. If this is of interest to you, read on for information on what is needed, and how to use it!

Prerequisites

This script has just a few requirements.

To start, you need a compute environment! This script is written for use with the SLURM job scheduler. It will not work out-of-the-box with other schedulers, but could possibly be made to work with them (doing so is left as an exercise to the reader).

You will also need the AWS CLI installed. This also means that you will need some sort of Python installation.

The script itself is written in Bash. The default Bash shell for your OS should be sufficient.

How to Use

Prepare

You should have received a .sh file from seqbot. It is a text file. Before you can start, you will need this file. Remember, these files are time-limited! Once a download has been made available to you, the access will expire after 36 hours.

To use the script, you must first ensure that the AWS and SLURM commands aws, scontrol, and sbatch are in your default path.

You can use the which command to check this, like so:

example of running 'which'

If the which command returns an error, then you will need to do something to make the command accessible. That might mean loading a module, or changing your PATH environment variable. Here is an example:

example of changing PATH and loading a module

Running

Pick a place for the files to live. The script will download all of the files into your chosen directory, so you might need to make a new directory to hold the files. Also, make sure your chosen download location has enough room!

Use the cd command to move into the chosen directory. And then run the script:

example of running the s3sync.sh script

When promped, copy-paste the complete seqbot .sh file into the program. When done pasting, press the <Return> (or <Enter>) key, followed by an EOF character (which is <Control-D>). The script will parse the .sh file, exact the download instructions, check that they are valid, and submit the download job to SLURM!

At the end, you will get a job ID number. You can use the number to track the status of your job.

SLURM partition note: By default, the script will submit the job into whatever is your default SLURM partition. If you need to change that, you can add sbatch options command-line options to the end of the script. So, instead of running s3sync.sh, you could run s3sync.sh -p special to submit the job to the 'special' SLURM partition.

Once the job begins, you will see files appearing in the directory where you initially ran the script. There will also be a .out file, which logs any messages generated by the download program (if there was nothing to log, the file will be empty).

Monitor

After submission, you will (eventually) receive a number of emails from SLURM:

  • The first email will have "Begin" in the subject line, telling you that your download has started.

    After this point, you should quickly start to see files appearing in your download directory.

  • You may receive an email with "Queued" in the subject line. This tells you that the download exceeded the four-hour time allocation. The download has been paused, and the job re-queued for another four-hour allocation.

  • The last email will have "TBD" in the subject line, telling you that your download is complete!

At any time, you can run the following commands to check your job's status:

  • squeue -j JOBID (where JOBID is your batch job ID number) to see your job's status.

  • scontrol show job JOBID to see details of the job, like the expected start time.

  • If the job is complete (or failed) sacct -j JOBID will show you recorded statictics for the job.

If the download takes to long that your four-hour time allocation is exceeded, SLURM will notify the script of its impending termination. The script will then immediately resubmit itself, asking for another four hours of time.

If the script does have to run multiple times, on each new execution the AWS CLI will check the already-download files, and will only download files that are missing or incomplete.

After your first download

After your first successful run, if you plan on using the script in the future, you should go into the script and adjust the #SBATCH --time line. By default, this script requests four hours of runtime. Depending on your environment, that might be too short (meaning your job had to resubmit) or too long (your job completed well under its runtime limit). But both options have their pros and cons.

If your job completed quickly, reducing the requested runtime may allow your job to run sooner. Shorter jobs are normally able to be scheduled sooner, as SLURM fills the "holes" made between the larger jobs.

If you job took so long that it had to requeue itself, you should consider increasing the requested time for the job. The "pro" of this is, the aws command will not have to waste time checking over existing files (to build a fresh list of what to transfer). The "con" of this is, longer jobs often take longer to schedule.

Copyright, Licensing, and Contributions

The contents of this repository are © 2019 The Board of Trustees of the Leland Stanford Jr. University. It is made available under the MIT License.

Terminal captures were obtained using asciinema, and converted into animated GIF format by asciicast2gif.

Contributions are welcome, if they will fix bugs or improve the clarity of the scripts. If you would like to customize these scripts for your own environment, you should fork this repository, and then commit changes there. You should also update this README, particularly the Parameters and Customization section, to reflect the changes you made.