Arteria Stackstorm Pack

This pack provides re-usable units for automating tasks at a sequencing core facility using the StackStorm event-driven automation platform.

It forms the core of the Arteria automation system, which you can read about on our website or preprint. This pack integrates with a series of bioinformatic micro-services, which can be found at https://github.com/arteria-project.

This repository includes a Docker environment allowing you to install Arteria and its dependencies within a containerized environment.

This pack is intended as a starting point, not a turn-key solution. Most sequencing cores will have a sufficiently unique environment that a specialized solution must be developed, but our goal is to provide components to facilitate this development.

Mission

The components provided by Arteria pack have a two-fold purpose:

To be a point of collaboration for the Arteria community where potentially reusable StackStorm components can be deposited
To provide a quick-start launchpad for organizations interested in implementing an Arteria system

Demo

Here we demonstrate using Docker to bootstrap an Arteria system comprised of arteria-packs and several Arteria microservices. We then use the system to run a simple workflow on a runfolder.

Getting Started

System requirements

You will need to have the following installed:

Installation

git clone https://github.com/arteria-project/arteria-packs
cd arteria-packs
make up

To register the Arteria pack with Stackstorm, run:

docker exec stackstorm st2ctl reload --register-all
docker exec stackstorm st2 run packs.setup_virtualenv packs=arteria

Congratulations, you're now ready to run workflows.

Running the sample workflow

Put a runfolder in the docker-mountpoints/monitored-folder directory.

You can find a suitably small test data set here: https://doi.org/10.5281/zenodo.1204292

Then run:

docker exec stackstorm st2 run arteria.workflow_bcl2fastq_and_checkqc \
  runfolder_path='/opt/monitored-folder/<name of the runfolder>' \
  bcl2fastq_body='{"additional_args": "--ignore-missing-bcls --ignore-missing-filter --ignore-missing-positions --tiles s_1", "use_base_mask": "--use-bases-mask y1n*,n*"}'

Eventually you should see something like this:

id: 5a2516ea10895200eb467b63
action.ref: arteria.workflow_bcl2fastq_and_checkqc
parameters:
  runfolder_path: /opt/monitored-folder/my_runfolder
status: succeeded (286s elapsed)
result_task: mark_as_done
result:
  exit_code: 0
  result: true
  stderr: ''
  stdout: ''
start_timestamp: 2017-12-04T09:35:38.361039Z
end_timestamp: 2017-12-04T09:40:24.743737Z
+--------------------------+--------------------------+--------------------+---------------------------+-------------------------------+
| id                       | status                   | task               | action                    | start_timestamp               |
+--------------------------+--------------------------+--------------------+---------------------------+-------------------------------+
| 5a2516eb10895200eb467b66 | succeeded (1s elapsed)   | get_runfolder_name | core.local                | Mon, 04 Dec 2017 09:35:38 UTC |
| 5a2516eb10895200eb467b68 | succeeded (1s elapsed)   | mark_as_started    | arteria.runfolder-service | Mon, 04 Dec 2017 09:35:39 UTC |
| 5a2516ed10895200eb467b6a | succeeded (1s elapsed)   | start_bcl2fastq    | arteria.bcl2fastq-service | Mon, 04 Dec 2017 09:35:41 UTC |
| 5a2516ef10895200eb467b6c | succeeded (267s elapsed) | poll_bcl2fastq     | arteria.bcl2fastq-service | Mon, 04 Dec 2017 09:35:42 UTC |
| 5a2517fd10895200eb467b6e | succeeded (1s elapsed)   | checkqc            | core.http                 | Mon, 04 Dec 2017 09:40:13 UTC |
| 5a2517fe10895200eb467b70 | succeeded (1s elapsed)   | mark_as_done       | arteria.runfolder-service | Mon, 04 Dec 2017 09:40:14 UTC |
+--------------------------+--------------------------+--------------------+---------------------------+-------------------------------+

Indicating that you have successfully executed a workflow which has demultiplexed the runfolder using bcl2fastq and and checked its quality control statistics using CheckQC.

You can find bcl2fastq output in docker-mountpoints/bcl2fastq-output.

Architecture

This project provides re-usable components for StackStorm in the form of actions, workflows, sensors, and rules.

The StackStorm docs are a comprehensive guide to these concept, but here we provide a summary:

Actions encapsulate system tasks such as calling a web service or running a shell script
Workflows tie actions together
Sensors pick up events from the environment, e.g. listening for new files to appear in a directory, or polling a web service for new events
Rules parse events from sensors and determine if an action or a workflow should be initiated

In order to facilitate quick setup, this repo also provides a Docker environment. In addition to running a StackStorm instance, it also runs a set of Arteria micro-services, which make it possible to run bcl2fastq on an Illumina runfolder, and then check that is passes a set of quality criteria using checkQC

The code is structured as follows:

.
├── actions = StackStorm actions
│   └── workflows = StackStorm workflows
├── docker-conf = config files for the docker images
├── docker-images = Dockerfiles for Arteria containers
│   ├── bcl2fastq-service
│   ├── checkqc-service
│   └── runfolder-service
├── docker-mountpoints = directories mounted to Docker containers
│   ├── bcl2fastq-output = will contain bcl2fastq output from the sample workflow
│   └── monitored-folder = deposit your runfolders here for processing
├── docker-runtime = startup container scripts, see: https://github.com/StackStorm/st2-docker#running-custom-shell-scripts-on-boot
├── rules = StackStorm rules 
├── sensors = StackStorm sensors
└── tests = unit and integration tests

Advanced Usage

Container access

To get into the StackStorm master node, run:

make interact

From there you can issue st2 commands directly, without the docker exec stackstorm prefix.

Running as sudo

If you are running make and docker with sudo you need to do so with the -E flag to ensure that the environment variables get passed correctly. For example:

sudo -E make up

Troubleshooting

You may encounter failures during one or more steps in the workflow:

+--------------------------+------------------------+--------------------+--------------------------+--------------------------+
| id                       | status                 | task               | action                   | start_timestamp          |
+--------------------------+------------------------+--------------------+--------------------------+--------------------------+
| 5c78e3ba8123e6012739119c | succeeded (0s elapsed) | get_runfolder_name | core.local               | Fri, 01 Mar 2019         |
|                          |                        |                    |                          | 07:48:10 UTC             |
| 5c78e3ba8123e6012739119e | succeeded (1s elapsed) | mark_as_started    | arteria.runfolder_servic | Fri, 01 Mar 2019         |
|                          |                        |                    | e                        | 07:48:10 UTC             |
| 5c78e3bb8123e601273911a0 | failed (0s elapsed)    | start_bcl2fastq    | arteria.bcl2fastq_servic | Fri, 01 Mar 2019         |
|                          |                        |                    | e                        | 07:48:11 UTC             |
+--------------------------+------------------------+--------------------+--------------------------+--------------------------+

You can troubleshoot the failed step further by getting the execution id, in this case:

docker exec stackstorm st2 execution get 5c78e3bb8123e601273911a0

Activating sensors

Stackstorm can detect changes in the surrounding environment through sensors. This pack provides a RunfolderSensor, which queries the the runfolder service for state information.

By activating this sensor, we can automatically trigger a workflow once a runfolder is marked "ready" in the runfolder service.

You can confirm that the sensor is activated by running:

st2 sensor list

To connect the sensor and workflow, activate the rule:

docker exec stackstorm st2 rule enable arteria.when_runfolder_is_ready_start_bcl2fastq

Put a runfolder in docker-mountpoints/monitored-folder, and set its state to ready using:

docker exec stackstorm st2 run arteria.runfolder_service cmd="set_state" state="ready" runfolder="/opt/monitored-folder/<name of your runfolder>" url="http://runfolder-service"

Within 15s you should if you execute docker exec stackstorm st2 execution list see that a workflow processing that runfolder has started. This is the way that Arteria can be used to automatically start processes as needed.

You can see details of the sensor's inner workings with:

docker exec stackstorm /opt/stackstorm/st2/bin/st2sensorcontainer --config-file=/etc/st2/st2.conf --debug --sensor-ref=arteria.RunfolderSensor

Re-building the environment

You can remove the existing environment with:

make remove-all

Then, re-run the Installation instructions.

Running tests

docker exec stackstorm st2-run-pack-tests -c -v -p /opt/stackstorm/packs/arteria

Acknowledgements

The docker environment provided here has been heavily inspired by the ones provided by StackStorm and UMCCR.

matrulda/arteria-packs