Deployment CLI (replace
- somebody@someemail.com with your username/email for identification of allocated resources
- miniwdl-bucket with desired name for S3 buckets for outputs. miniwdl-bucket is default ):
# Create a new VPC and deploy MiniWDL infrastructure in this VPC
aws cloudformation deploy --template-file cfn-miniwdl-new-vpc.yaml \
--stack-name MiniWDL-new-VPC --capabilities CAPABILITY_NAMED_IAM \
--parameter-overrides S3UploadBucket=miniwdl-bucket Owner=somebody@someemail.com
aws cloudformation describe-stacks --stack-name MiniWDL
# Deploy MiniWDL infrastructure in existing VPC
aws cloudformation deploy --template-file cfn-miniwdl.yaml \
--stack-name MiniWDL --capabilities CAPABILITY_NAMED_IAM \
--parameter-overrides \
S3UploadBucket=miniwdl-bucket \
Owner=somebody@someemail.com \
Subnet0=subnet-0b2ad3bbbe3652a00 \
Subnet1=subnet-0f6db482bddf223c8 \
SecurityGroupId=sg-058deaa09fcdadc69
aws cloudformation describe-stacks --stack-name MiniWDL
NOT YET FINISHED AND IS NOT WORKING
# Deploy MiniWDL FSx for Lustre infrastructure in existing VPC
aws cloudformation deploy --template-file cfn-miniwdl-fsx.yaml \
--stack-name MiniWDL-fsx --capabilities CAPABILITY_NAMED_IAM \
--parameter-overrides \
S3UploadBucket=miniwdl-bucket \
Owner=somebody@someemail.com \
SubnetId=subnet-0b2ad3bbbe3652a00 \
SecurityGroupId=sg-058deaa09fcdadc69
miniwdl-aws-submit --no-efs \
--workflow-queue miniwdl-lustre-workflow \
--self-test --follow
aws cloudformation describe-stacks --stack-name MiniWDL-fsx
pip install git+https://github.com/staskh/miniwdl-aws.git
Replace --s3upload value with one selected in infrastructure deployment.
to test your setup, run
miniwdl-aws-submit --self-test --follow --workflow-queue miniwdl-workflow
the same, but explicit test can be perfomed with
miniwdl-aws-submit --verbose --no-cache --follow --s3upload s3://miniwdl-bucket/self_test https://raw.githubusercontent.com/staskh/miniwdl-aws/main/test_workflow/self_test/test.wdl who=https://raw.githubusercontent.com/chanzuckerberg/miniwdl/main/tests/alyssa_ben.txt
to test GPU-based workflow, run
miniwdl-aws-submit --verbose --no-cache --follow --s3upload s3://miniwdl-bucket/gpu_test https://raw.githubusercontent.com/staskh/miniwdl-aws/main/test_workflow/gpu_test/gpu_test.wdl
Deployment script for miniwdl-aws cloud, replacement Terraform-based script miniwdl-aws-terraform]
See WDL example at https://github.com/staskh/miniwdl-aws/tree/main/test_workflow/gpu_test
Extends miniwdl to run workflows on AWS Batch and EFS
This miniwdl plugin enables it to execute WDL tasks as AWS Batch jobs. It uses EFS for work-in-progress file I/O, optionally uploading final workflow outputs to S3.
Before diving into this, first consider Amazon Omics, which includes a WDL workflow runner service that doesn't need you to deploy compute infrastructure in your AWS account. (The behind-the-scenes implementation differs from the plugin found here.)
There are a few ways to deploy this miniwdl-aws plugin:
Amazon Genomics CLI can deploy a miniwdl-aws context into your AWS account with all the necessary infrastructure.
Or, try the miniwdl-aws-studio recipe to install miniwdl for interactive use within Amazon SageMaker Studio, a web IDE with a terminal and filesystem browser. You can use the terminal to operate miniwdl run
against AWS Batch, the filesystem browser to manage the inputs and outputs on EFS, and the Jupyter notebooks to further analyze the outputs.
Lastly, advanced operators can use miniwdl-aws-terraform to deploy/customize the necessary AWS infrastructure, including a VPC, EFS file system, Batch queues, and IAM roles.
In this scheme, a local command-line wrapper miniwdl-aws-submit
launches miniwdl in its own small Batch job to orchestrate a workflow. This workflow job then spawns WDL task jobs as needed, without needing the submitting laptop to remain connected for the duration. The workflow jobs run on lightweight Fargate resources, while task jobs run on EC2 spot instances.
After deploying miniwdl-aws-terraform, pip3 install miniwdl-aws
locally to make the miniwdl-aws-submit
program available. Try the self-test:
miniwdl-aws-submit --self-test --follow --workflow-queue miniwdl-workflow
Then launch a viral genome assembly that should run in 10-15 minutes:
miniwdl-aws-submit \
https://github.com/broadinstitute/viral-pipelines/raw/v2.1.28.0/pipes/WDL/workflows/assemble_refbased.wdl \
reads_unmapped_bams=https://github.com/broadinstitute/viral-pipelines/raw/v2.1.19.0/test/input/G5012.3.testreads.bam \
reference_fasta=https://github.com/broadinstitute/viral-pipelines/raw/v2.1.19.0/test/input/ebov-makona.fasta \
sample_name=G5012.3 \
--workflow-queue miniwdl-workflow \
--s3upload s3://MY-BUCKET/assemblies \
--verbose --follow
The command line resembles miniwdl run
's with extra AWS-related arguments:
--workflow-queue
Batch job queue on which to schedule the workflow job; output from miniwdl-aws-terraform, defaultminiwdl-workflow
. (Also set by environment variableMINIWDL__AWS__WORKFLOW_QUEUE
)--follow
live-streams the workflow log instead of exiting immediately upon submission. (--wait
blocks on the workflow without streaming the log.)--s3upload
(optional) S3 folder URI under which to upload the workflow products, including the log and output files (if successful). The bucket must be allow-listed in the miniwdl-aws-terraform deployment.- Unless
--s3upload
ends with /, one more subfolder is added to the uploaded URI prefix, equal to miniwdl's automatic timestamp-prefixed run name. If it does end in /, then the uploads go directly into/under that folder (and a repeat invocation would be expected to overwrite them).
- Unless
miniwdl-aws-submit
detects other infrastructure details (task queue, EFS access point, IAM role) based on the workflow queue; see miniwdl-aws-submit --help
for additional options to override those defaults.
If the specified WDL source code is an existing local .wdl or .zip file, miniwdl-aws-submit
automatically ships it with the workflow job as the WDL to execute. Given a .wdl file, it runs miniwdl zip
to detect & include any imported WDL files; while it assumes .zip files were also generated by miniwdl zip
. If the source code is too large to fit in the AWS Batch request payload (~50KB), then you'll instead have to pass it by reference to a URL or EFS path.
Arguments not consumed by miniwdl-aws-submit
are passed through to miniwdl run
inside the workflow job; as are environment variables whose names begin with MINIWDL__
, allowing override of any miniwdl configuration option (disable wih --no-env
). See miniwdl_aws.cfg for various options preconfigured in the workflow job container.
The workflow and task jobs all mount EFS at /mnt/efs
. Although workflow input files are usually specified using HTTPS or S3 URIs, files already resident on EFS can be used with their /mnt/efs
paths (which probably don't exist locally on the submitting machine). Unlike the WDL source code, miniwdl-aws-submit
will not attempt to ship/upload local input files.
Miniwdl runs the workflow in a directory beneath /mnt/efs/miniwdl_run
(override with --dir
). The outputs also remain cached there for potential reuse in future runs (to avoid, submit with --no-cache
or wipe /mnt/efs/miniwdl_run/_CACHE
).
Given the EFS-centric I/O model, you'll need a way to browse and manage the filesystem contents remotely. The companion recipe lambdash-efs is one option; miniwdl-aws-terraform outputs the infrastructure details needed to deploy it (pick any subnet). Or, set up an instance/container mounting your EFS, to access via SSH or web app (e.g. JupyterHub, Cloud Commander, VS Code server).
You can also automate cleanup of EFS run directories by setting miniwdl-aws-submit --s3upload
and:
--delete-after success
to delete the run directory immediately after successful output upload--delete-after failure
to delete the directory after failure--delete-after always
to delete it in either case- (or set environment variable
MINIWDL__AWS__DELETE_AFTER_S3_UPLOAD
)
Deleting a run directory after success prevents the outputs from being reused in future runs. Deleting it after failures can make debugging more difficult (although logs are retained, see below).
Going through AWS Batch & EFS, miniwdl can't enforce the strict file system isolation between WDL task containers that it does locally. All the AWS Batch containers have read/write access to the entire EFS file system (as viewed through the access point), not only their initial working directory.
This is usually benign, because WDL tasks should only read their declared inputs and write into their respective working/temporary directories. But poorly- or maliciously-written tasks could read & write files elsewhere on EFS, even changing their own input files or those of other tasks. This risks unintentional side-effects or worse security hazards from untrusted code.
To mitigate this, test workflows thoroughly using the local backend, which strictly isolates task containers' file systems. If WDL tasks insist on modifying their input files in place, then --copy-input-files
can unblock them (at a cost in time, space, and IOPS). Lastly, avoid using untrusted WDL code or container images; but if they're necessary, then use a separate EFS access point and restrict the IAM and network configuration for the AWS Batch containers appropriately.
To scale up to larger workloads, it's important to study AWS documentation on EFS performance and monitoring. Like any network file system, EFS limits on throughput and IOPS can cause bottlenecks; and worse, exhaustion of the default bursting throughput mode can effectively freeze a workflow.
Management tips:
- Monitor file system throughput limits, IOPS, and burst credits in the EFS area of the AWS Console.
- Stage large datasets onto the file system well in advance, increasing the available burst throughput.
- Enable the Elastic or Provisioned throughput modes (at increased cost)
- Code WDL tasks to write any purely-temporary files into
$TMPDIR
, which may use local scratch space, instead of the EFS working directory. - Configure miniwdl and AWS Batch to limit the number of concurrent jobs and/or the rate at which they turn over (see miniwdl_aws.cfg for relevant details).
- Spread out separate workflow runs over time or across multiple EFS file systems.
If EFS performance remains insufficient, then you can configure your Batch compute environments to automatically mount some other shared filesystem upon instance startup. Then use miniwdl-aws-submit --no-efs
to make it assume the filesystem will already be mounted at a certain location (default --mount /mnt/net
) across all instances. In this case, the compute environment for workflow jobs is expected to use EC2 instead of Fargate resources (usually necessary for mounting).
The miniwdl-aws-terraform repo includes a variant setting this up with FSx for Lustre. FSx offers higher throughput scalability, but has other downsides compared to EFS (higher upfront costs, manual capacity scaling, single-AZ deployment, fewer AWS service integrations).
If the terminal log isn't available (through Studio or miniwdl-submit-awsbatch --follow
) to trace a workflow failure, look for miniwdl's usual log files written in the run directory on EFS or copied to S3.
Each task job's log is also forwarded to CloudWatch Logs under the /aws/batch/job
group and a log stream name reported in miniwdl's log. Using miniwdl-aws-submit
, the workflow job's log is also forwarded. CloudWatch Logs indexes the logs for structured search through the AWS Console & API.
Misconfigured infrastructure might prevent logs from being written to EFS or CloudWatch at all. In that case, use the AWS Batch console/API to find status messages for the workflow or task jobs.
Pull requests are welcome! For help, open an issue here or drop in on #miniwdl in the OpenWDL Slack.
Code formatting and linting. To prepare your code to pass the CI checks,
pip3 install --upgrade -r test/requirements.txt
pre-commit run --all-files
Running tests. In an AWS-credentialed terminal session,
MINIWDL__AWS__WORKFLOW_QUEUE=miniwdl-workflow test/run_tests.sh
This builds the requisite Docker image from the current code revision and pushes it to an ECR repository (which must be prepared once with aws ecr create-repository --repository-name miniwdl-aws
). To test an image from the GitHub public registry or some other version, set MINIWDL__AWS__WORKFLOW_IMAGE
to the desired tag.