This folder contains example WDL based workflows that use Parabricks to run on Amazon Omics.
These are provided AS-IS and are intended to demonstrate conventions, patterns, and best practices for writing workflows for scale. They are intended as starting points that you can customize to fit your specific requirements.
The software pre-requisites needed to build a private workflow for Amazon Omics are packaged as a Dockerfile in this repo. We will first build this Dockerfile, run it, log into the AWS CLI, and then submit jobs to Omics.
Amazon Omics requires that any Docker containers that we use are inside of a private Elastic Container Repository (ECR). For this example we will be using a public Parabricks container, so we must move it into a private ECR repo.
Create a private ECR repo and name it "parabricks". This is easiest to do in the AWS console using these instructions.
Make sure that Omics has permissions to access this ECR repo by editing the Policy JSON according to instructions in the AWS Docs.
Then on the command line, log in to ECR using:
aws ecr get-login-password --region <region> | docker login --username AWS --password-stdin <aws_account_id>.dkr.ecr.<region>.amazonaws.com
Pull the latest Parabricks Amazon Linux image using:
docker pull nvcr.io/nvidia/clara/nvidia_clara_parabricks_amazon_linux:<version>
Tag the image to get it ready for ECR:
docker tag nvcr.io/nvidia/clara/nvidia_clara_parabricks_amazon_linux:<version> <aws_account_id>.dkr.ecr.<region>.amazonaws.com/parabricks:<version>
Finally, push this image to your private ECR repo:
docker push <aws_account_id>.dkr.ecr.<region>.amazonaws.com/parabricks:<version>
Now we have our Parabricks docker image in a place where Amazon Omics can see it.
For troubleshoot help, please see the Amazon docs on pushing to ECR repos.
First we will build and run a Docker container on our local machine. To build the Docker container, run the following commands
cd dockerfiles
docker build -t omics-private-workflows .
Now we can run the container:
cd ..
docker run --rm -it -v `pwd`:`pwd` -w `pwd` omics-private-workflows /bin/bash
To submit jobs to Omics, we must login to the AWS CLI with our preferred credentials. Use the following command to set that up:
aws configure # Make sure to provide AWS Access Key ID, AWS Secret Access Key, and region
Now we are ready to build and submit the private workflows
Update the test.parameters.json
file for the workflow that you plan to run. Each workflow has its own copy of this file at:
parabricks/workflows/<workflow-name>/test.parameters.json
In this file, make sure that the paths to the data point to S3 buckets that you have access to, and update the pb_version
to match the docker image tag for the Parabricks image you loaded into ECR.
Now we are ready to build any Parabricks workflow!
Use the following commands to first build this repo, and then to build a workflow. The workflow names can be found in the parabricks/workflows
folder:
cd parabricks
make
make run-{workflow_name} # substitute "{workflow_name}" accordingly
If this is the first time running any workflow, make
will perform the following build steps:
-
Configure and deploy the
omx-ecr-helper
CDK appWorkflows that run in AWS HealthOmics must have containerized tooling sourced from ECR private image repositories. These workflows use 4 unique container images. The
omx-ecr-helper
is a CDK application that automates converting container images from public repositories like Quay.io, ECR-Public, and DockerHub to ECR private image repositories. -
Run a Step functions state machine from
omx-ecr-helper
to pull container images used by these workflows into ECR Private Repositories -
Create AWS IAM roles and permissions policies required for workflow runs
-
Create an Amazon S3 bucket for staging workflow definition bundles and workflow execution outputs
-
Create a zip bundle for the workflow that is registered with AWS HealthOmics
-
Start an AWS HealthOmics Workflow run for the workflow with test parameters
Additional artifacts produced by the build process will be generated in build/
.
You can customize the build process by modifying conf/default.ini
.
To remove local build assets run:
make clean
Note: this command does not delete any deployed AWS resources. You are expected to manage these as needed. Resources of note:
-
No cost resources:
- The
omx-ecr-helper
CDK app is serverless and does not incur costs when idle. - HealthOmics Workflows do not incur costs when not running
- The
-
Resources with costs
- Amazon ECR Private repositories for container images have a storage cost - see Amazon ECR pricing for more details
- Data generated and stored in S3 have a storage cost - see Amazon S3 pricing for more details
Each workflow defintion and any supporting files are in its own folder with the following structure:
workflows
├── {workflow_name}
│ ├── cli-input.yaml
│ ├── (main | named-entrypoint).wdl
│ ├── parameter-template.json
│ ├── test.parameters.json
│ └── ... additional supporting files ...
...
The parameter-template.json
file for each workflow should match inputs{}
defined in the workflow{}
stanza of the main entrypoint WDL.
The test.parameters.json
file is a subset of the parameters used. Additional parameters:
ecr_registry
aws_region
are added and populated based on the AWS profile used during the build process (when you execute make run-{workflow-name}
).