Reverse / Complement in Common Workflow Language
Here are two tools and two workflows, all written to demonstrate features of the Common Workflow Language.
In reverse/
and complement/
there are simply Python scripts that implement
the reverse and complement
operations on a file containing a DNA string. There are also corresponding
Docker containers and Dockerfiles for building them.
reverse: a Common Workflow Language tool description
Each tool has a Common Workflow Language tool description file. The
file from reverse.cwl
is reproduced below:
cwlVersion: v1.0
class: CommandLineTool
hints:
DockerRequirement:
dockerPull: pvanheus/reverse:latest
baseCommand: reverse.py
inputs:
dnafile:
type: File
inputBinding:
position: 1
stdout: $(inputs.dnafile.nameroot)\_reversed$(inputs.dnafile.nameext)
outputs:
rev_dnafile:
type: stdout
For an introduction to the CWL, see the User Guide.
A few items bear mentioning, however: the Docker image for the tool is associated
with the tool using a hint. I.e. this tool can either be run with or without
(Docker) container support. The baseCommand
refers to the command line
tool (reverse.py
) being run, and a single command line argument is specified. Given this command line, reverse.py
writes output to stdout
. This is captured and redirected to a file.
Instead of giving a simple string as the output
filename, parameters of the input are used as part of a filename pattern. In
CWL, the $()
refers to a variable lookup. The dnafile
is a File
type,
you can read about its attributes in the corresponding section of the
CWL specification.
The effect of this pattern is that if the input is dna.txt
the output
will be dna_reversed.txt
.
Finally, a single output is captured from the tool, given the name rev_dnafile
which is associated with the previously-captured stdout
.
This tool description allows the tool to be run, either by command line
invocation of something like the cwltool
script, or via other mechanisms
provided by CWL-supporting workflow management systems. Effectively this
tool description translates between a description of inputs and outputs
and the command line parameters and output files associated with a tool.
revcomp: a Common Workflow Language Workflow
The reverse and complement tools can be chained together using a workflow. This
is what such a workflow could look like (revcomp.cwl
):
cwlVersion: v1.0
class: Workflow
requirements:
- class: InlineJavascriptRequirement
inputs:
infile:
type: File
inputBinding:
position: 1
outputs:
revcomp_dnafile:
type: File
outputSource: complement/comp_dnafile
steps:
reverse:
run: reverse.cwl
in:
dnafile: infile
out: [rev_dnafile]
complement:
run: complement.cwl
in:
dnafile: reverse/rev_dnafile
out: [comp_dnafile]
Note that this is class: Workflow
instead of class: CommandLineTool
. It has its own inputs
and outputs
, just like a command line tool. Instead of linking to a command (with baseCommand
), a workflow has steps
, which in this case each refer to a CWL command line tool description file. In each step the inputs provided and outputs used are mentioned using in
and out
respectively. The infile
input from inputs
is bound to the dnafile
parameter of reverse.cwl
. The rev_dnafile
output of this tool is captured and then bound to the dnafile
parameter of complement.cwl
. Finally comp_dnafile
from the complement step is bound (using outputSource
) to the revcomp_dnafile
output of the workflow as a whole.
At this (workflow) level the implementation details of the steps are hidden. The workflow has no need to know about the structure of the command line or Docker containers. It only needs to know steps and how steps relate to each other.
One effect of how the steps are written is that the final output file will
inherit patterns from each step. So dna.txt
becomes dna_reversed_complement.txt
.
The final example revcomp_with_rename.cwl
illustrates some of the more
advanced features of CWL. Firstly this workflow takes an optional string
parameter (outfile_name
) which is the name to give the output file of the workflow.
This parameter is optional because its type is string?
. Like in
regular expressions, the ?
denotes that this paramter does not have to be
provided. If the parameter is not provided, the output filename would be the same
as for revcomp.cwl
.
The second difference from revcomp.cwl
is the step
that allows output renaming. Instead of being provided as a command line tool,
it runs a Javascript expression. This require a InlineJavascriptRequirement
to be specified and then a step of type ExpressionTool
:
rename:
run:
class: ExpressionTool
inputs:
infile:
type: File
outfile_name:
type: string?
outputs:
outfile: File
expression: >
${
var outfile = inputs.infile;
if (inputs.outfile_name) {
outfile.basename = inputs.outfile_name;
}
return { "outfile": outfile }; }
in:
infile: complement/comp_dnafile
outfile_name: outfile_name
out: [outfile]
This also has an optional outfile_name
parameter. The Javascript tests if this is
set and if so, uses it to modify the outfile.basename
, resulting in the
output file having the name specified by the user. The Javascript expression
returns a dictionary whose keys correspond to the names of outputs specified
for the step.
When using cwltool
the InlineJavascriptRequirement
requires the use of NodeJS (the node
) command, either through having it installed as a packer or via the use of a Docker container that can run node
. The Javascript code is sandboxed and cannot access anything outside of its limited execution context.
Conclusion
These examples show just a little of what can be done with the Common Workflow Language. While they don't do much useful, they should illustrate how the Common Workflow Language allows you to map command line realities to a higher level tool and workflow specification. These specifications can then be shared and improved on, paving the way towards collaborative and reproductible science!
Using SoftwareRequirements and Dependency Resolvers
CWL offers an alternative to containers: Software Requirements. These
are described in the cwltool documentation
and allow software package names to be specified instead of container names.
This is effectively built around the Galaxy dependency resolvers system
as described in the Galaxy docs
and this blog post.
For these to work galaxy-lib
needs to be installed and a configuration
file like dependency-resolvers.yml needs to
be configured. The provided example shows how to point to a tools
directory using the Galaxy packages resolver type. In the tools/
directory of this repository you can find a jedi
package (version 1.0.0)
which contains the reverse.py
and complement.py
scripts. The
directory structure is like that expected for Galaxy packages. With
the base path set correctly in the dependency-resolvers.yml
, you
can run e.g. the reverse.cwl
tool with the command:
cwl-runner --no-container --beta-dependency-resolvers-configuration dependency-resolvers.yml reverse/reverse.cwl --dnafile data/dna.txt
Note that this will just set up a path so that the required scripts are
found, and not resolve dependencies such as the click
Python module.
This is a limitation of naive SoftwareRequirements that can be
remedied by using a more sophisticated dependency resolver such as
conda.