ga4gh logo

Schemas for the Workflow Execution Service (WES) API

This is used by the Data Working Group - Containers and Workflows Task Team

View in Swagger

The Global Alliance for Genomics and Health is an international coalition, formed to enable the sharing of genomic and clinical data.

Containers and Workflows Task Team

The Data Working Group concentrates on data representation, storage, and analysis, including working with platform development partners and industry leaders to develop standards that will facilitate interoperability. The Containers & Workflows working group is an informal, multi-vendor working group focused on standards for exchanging Docker-based tools and CWL/WDL workflows, execution of Docker-based tools and workflows on clouds, and abstract access to cloud object stores.

What is WES?

This is the home of the Workflow Execution Schema proposal. The Workflow Execution Schema is a minimal common API describing how a user can submit workflow requests to workflow execution systems in a standardized ways. Workflow execution engines (SevenBridges, FireCloud, etc) can support this API so users can make workflow requests programmatically, adding the ability to scale up. In addition, these workflow services could have (and probably do have) UIs that would (possibly) use this API under the hood to facilitate workflow execution requests.

Having this standard API supported by multiple execution engines will give people options of processing the same workflow (CWL or WDL) across different workflow execution platforms running across various clouds/environments. As an example use case, one can find a workflow in CWL on Dockstore.org, use Dockstore to generate a JSON parameterization file, and submit this a GA4GH-compliant workflow execution service.

Key features of the current API proposal:

  • ability to request a workflow run using CWL or WDL (and maybe future formats)
  • ability to parameterize that workflow using a JSON schema (ideally a future version would be in common between CWL and WDL)
  • ability to get information about running workflows, status, errors, output file locations etc
  • to search for workflows by arbitrary key/values

Outstanding questions:

  • a common JSON parameterization format, see work by Peter, is that checked in?
  • standardizing terms, job, workflow, steps, tools, etc
  • reference implementation at https://github.com/common-workflow-language/cwltool-service/tree/ga4gh-wes
  • validation service for testing WES implementations' conformance to the spec
  • Including all task_logs in the workflow log request may present a scaling problem when there are 100s-1000s of tasks
  • Providing a state notification callback URL (eg a webhook)
  • Passing through authentication (user role)

How to view

See the swagger editor to view our schema in progress.

If the current schema fails to validate, visit debugging

Building Documents

Make sure you have Docker installed for your platform and the cwltool.

virtualenv env
source env/bin/activate
pip install setuptools==28.8.0
pip install cwl-runner cwltool==1.0.20161114152756 schema-salad==1.18.20161005190847 avro==1.8.1

Make sure you have the submodule checked out:

git submodule update --init --recursive

You can generate the Swagger YAML from the Protocol Buffers:

cwltool CWLFile

Find the output in workflow_execution.swagger.json and this can be loaded in the Swagger editor. Use the GitHub raw feature to generate a URL you can load.

When you're happy with the changes, checkin this file:

mv workflow_execution.swagger.json swagger/proto/

And commit your changes.

How to contribute changes

Take cues for now from the ga4gh/schemas document.

We like HubFlow and using pull requests to suggest changes.

License

See the [LICENSE]

More Information