/sbpack

Command line utility to pack and upload/download CWL to/from SB powered platform

Primary LanguagePythonApache License 2.0Apache-2.0

sbpack

PyPI version

Upload (sbpack) and download (sbpull) CWL apps to/from any Seven Bridges powered platform. Resolves linked processes, schemadefs and $includes and $imports.

Installation

(It is good practice to install Python programs in a virtual environment. pipx is a very effective tool for installing command line Python tools in isolated environments)

sbpack needs Python 3.7 or later

pip3 install pipx  # in case you don't have pipx
pipx ensurepath # ensures CLI application directory is on your $PATH

Install latest release on pypi

pipx install sbpack
# or pipx upgrade

Install latest (unreleased) code

pipx install git+https://github.com/rabix/sbpack.git
# use pipx upgrade ... if upgrading an existing install

Usage

$ sbpack -h

sbpack v2020.10.05
Upload CWL apps to any Seven Bridges powered platform
(c) Seven Bridges 2020

usage: sbpack [-h] [--filter-non-sbg-tags] profile appid cwl_path

positional arguments:
  profile               SB platform profile as set in the SB API credentials file.
  appid                 Takes the form {user}/{project}/{app_id}.
  cwl_path              Path or URL to the main CWL file to be uploaded.

optional arguments:
  -h, --help            show this help message and exit
  --filter-non-sbg-tags
                        Filter out custom tags that are not 'sbg:'

Uploading workflows defined remotely

sbpack handles local paths and remote URLs in a principled manner. This means that sbpack will handle packing and uploading a local workflow that links to a remote workflow which itself has linked workflows. It will therefore also handle packing a fully remote workflow.

For example, to pack and upload the workflow located at https://github.com/Duke-GCB/GGR-cwl/blob/master/v1.0/ATAC-seq_pipeline/pipeline-se.cwl go to the raw button and use that URL, like:

sbpack sbg kghosesbg/sbpla-31744/ATAC-seq-pipeline-se https://raw.githubusercontent.com/Duke-GCB/GGR-cwl/master/v1.0/ATAC-seq_pipeline/pipeline-se.cwl

Local packing

cwlpack <cwl> > packed.cwl

$ cwlpack -h
usage: cwlpack [-h] [--filter-non-sbg-tags] [--json] cwl_path

positional arguments:
  cwl_path              Path or URL to the main CWL file to be uploaded.

optional arguments:
  -h, --help            show this help message and exit
  --filter-non-sbg-tags
                        Filter out custom tags that are not 'sbg:'
  --json                Output in JSON format, not YAML.

The cwlpack utility allows you to pack a workflow and print it out on stdout instead of uploading it to a SB platform.

Side-note

As an interesting side note, packing a workflow can get around at least two cwltool bugs [1], [2].

Pulling (and unpacking)

sbpull will retrieve CWL from any SB powered platform and save it to local disk.

sbpull sbg admin/sbg-public-data/salmon-workflow-1-2-0/ salmon.cwl

With the --unpack option set, it will also explode the workflow recursively, extracting out each sub-process into its own file.

sbpull sbg admin/sbg-public-data/salmon-workflow-1-2-0/ salmon.cwl --unpack

This is useful if you want to use SB platform CWL with your own workflows. You can pull the relevant CWL into your code repository and use it with the rest of your code. If you use the --unpack option you can access the individual components of the SB CWL workflow separately.

Pulling a particular revision

While

sbpull sbg admin/sbg-public-data/bismark-0-21-0/ bismark.cwl

will pull the latest version of Bismark on the platform,

sbpull sbg admin/sbg-public-data/bismark-0-21-0/2 bismark.cwl

will pull revision 2 of this tool

Note on reversibility

sbpack and sbpull --unpack are not textually reversible. The packed and unpacked CWL representations are functionally identical, however if you sbpack a workflow, and then sbpull --unpack it, they will look different.

Credentials file and profiles

If you use the SBG API you already have an API configuration file. If not, you should create one. It is located in ~/.sevenbridges/credentials. (Documentation)

Briefly, each section in the SBG configuration file (e.g. [cgc]) is a profile name and has two entries. The end-point and an authentication token, which you get from your developer tab on the platform.

[sbg-us]
api_endpoint = https://api.sbgenomics.com/v2
auth_token   = <dev token here>

[sbg-eu]
api_endpoint = https://eu-api.sbgenomics.com/v2
auth_token   = <dev token here>

[sbg-china]
api_endpoint = https://api.sevenbridges.cn/v2
auth_token   = <dev token here>

[cgc]
api_endpoint = https://cgc-api.sbgenomics.com/v2
auth_token   = <dev token here>

[cavatica]
api_endpoint = https://cavatica-api.sbgenomics.com/v2
auth_token   = <dev token here>

[nhlbi]
api_endpoint = https://api.sb.biodatacatalyst.nhlbi.nih.gov/v2
auth_token   = <dev token here>

You can have several profiles on the same platform if, for example, you are an enterprise user and you belong to several divisions. Please refer to the API documentation for more detail.

Reading credentials from env variables

Instead of using the credentials file, you can specify environment variables SB_API_ENDPOINT and SB_AUTH_TOKEN. To use the env variables in sbpack simply specify profile . in the command, e.g.

sbpack . kghosesbg/sbpla-31744/ATAC-seq-pipeline-se https://raw.githubusercontent.com/Duke-GCB/GGR-cwl/master/v1.0/ATAC-seq_pipeline/pipeline-se.cwl

By specifying . profile, sbpack will use env variables. If these are not found, the default profile from the credentials file is used.

Running the test suite

The pulling test requires two environment variables to be set

SB_AUTH_TOKEN
SB_API_ENDPOINT