CWLMake
Closed this issue · 11 comments
- Überblick prior art, eg. https://github.com/tom-tan/zatsu-cwl-generator checken.
- minimales Templating-System um CWL einfacher zu schreiben.
- möglicherweise ausbaubar um CWL mit Provenance-Informationen (Linked Data) auszustatten.
prior art
besides https://github.com/tom-tan/zatsu-cwl-generator (commit 2 years ago) there is also :
- cwl-ex, an experimental cwl grammar (commit 1 year ago)
- WDL 2 CWL Translator
- Janis Code to CWL (commit 1 year ago)
- argparse2tool (commit 2 years ago)
- pypi2cwl (commit 3 years ago)
- IPython2CWL (commit 4 years ago)
- CliHelpParser (commit 3 years ago)
- Galaxy2CWL (commit 4 years ago)
- cCWL (c for concise) - using LISP Syntax
- workflow-inference-compiler
- DATAPlant CWL Generator (commit 2 years ago) (generates cwl by asking questions)
- DATAPlant ISA CWL Converter (commit 1 year ago) (not sure what this is...)
a tool for visual editing of CWL Files Rabix Composer (commit 3 years ago) which uses CWL-SVG (commit 1 year ago). Composer is now developed as closed source tool for sbgenomics.
and various libraries for Python, JS/TS, C++, C#/F#, D, Java, R - most of them being auto generated from the specs
Comparison between workflow languages executing a simple python script with a string parameter and writing back an output file:
CWL | CWL (alternative) |
---|---|
cwlVersion: v1.2
class: CommandLineTool
requirements:
InitialWorkDirRequirement:
listing:
- entryname: print.py
entry:
$include: ./print.py
inputs:
message:
type: string
default: Hello World
inputBinding:
position: 1
outputs:
output_file:
type: File
outputBinding:
glob: helloworld.txt
baseCommand: [python3, ./print.py] |
cwlVersion: v1.2
class: CommandLineTool
inputs:
file:
type: File
default:
class: File
path: ./print.py
inputBinding:
position: 1
message:
type: string
default: Hello World
inputBinding:
position: 2
outputs:
output_file:
type: File
outputBinding:
glob: helloworld.txt
baseCommand: python3 |
Nextflow | SnakeMake |
process helloworld{
input:
val greeting
output:
file 'helloworld.txt'
script:
""" python3 ${projectDir}/print.py \"${greeting}\" """
}
params.greeting = "Hello World"
workflow {
helloworld(params.greeting)
} |
rule HelloWorld:
input:
thefile="input.txt"
output:
"helloworld.txt"
shell:
"greeting=\"$(cat {input.thefile})\" && "
"python3 ./print.py \"$greeting\"" |
- Was ist die Quellspracheß
- Eigenbau (finde Harald so mittel)
- recycling von prior art (e.g. https://github.com/common-workflow-lab/cwl-ex)
- Subset von Nextflow
- Subset von SnakeMake
Currently the CommandLine Tools are created with a set of Python Scripts doing some Regex with the R files, combined with some hints in the comments of those files which was the interim solution to complete
One could also use libraries such as BaklavaJS als node graph frontend to create Workflows. Similar to the deprecated Composer App ...
I did a quick (local) prototype yesterday. It loads CWL CommandLineTools from Disk (using FileSystemHandle API) and adds them as Nodes. One could export the graph as Workflow i think - this is not implemented though.
There is also CWL-SVG which is used by the Composer App but there are issues from 2018 still open with no answer and the standalone sample does not work anymore due to a package registry not existing anymore.
See
- #30
- fairagro/m4.4_concept#25
There also is WDL (Workflow Description Language) for which converters to CWL seem to already exist( https://github.com/common-workflow-lab/wdl-cwl-translator (Last commit yesterday)) and there is a huge amount of tools available in Dockstore. Both CWL and WDL are supported by the toil-Runner: https://toil.readthedocs.io/en/latest/
OpenWDL released Version 1.2 of their spec earlier this year. Syntax looks like if CWL and Nextflow had children^^
🤔
version 1.2
task hello_task {
input {
File infile
String pattern
}
command <<<
grep -E '~{pattern}' '~{infile}'
>>>
requirements {
container: "ubuntu:latest"
}
output {
Array[String] matches = read_lines(stdout())
}
}
workflow hello {
input {
File infile
String pattern
}
call hello_task {
infile, pattern
}
output {
Array[String] matches = hello_task.matches
}
}
WDL asl intermediate Sprache?
WDL oder Nextflow?
Jens macht pro- und konta-Liste. Entscheidung auch vor nächstem Meeting.
CWL vs WDL vs Nextflow
tl;dr: CWL is [suboptimal/verbose/not used] but seems to be the best tool for our usecase
Numbers
CWL | WDL | Nextflow | |
---|---|---|---|
GitHub | |||
# of GitHub Repos | 1k | 1k | 5k |
# of GitHub Users | 163 | 249 | 1k |
# of GitHub Stars | 1.4k | 759 | 2.6k |
# of contributors | 65 | 51 | 170 |
last commit to main spec repo | last year (2 weeks to spec 1.2 repo) | 3 months | 2 days |
License | Apache 2.0 | BSD 3-Clause | Apache 2.0 |
Entries on... | |||
... WorkflowHub | 81 | 12 | 129 |
... Dockstore | 226 | 3245 | 129 |
... nf-core | 0 | 0 | 97 |
CWL has common BioTools at https://github.com/common-workflow-library/bio-cwl-tools
Who?
CWL | Community Driven with Governance Comitee (Members from Arvados, Sevenbridges Genomics, University of Manchester, ... + 1 Galaxy & 1 WDL Member) |
WDL: | Community Driven with Governance Comitee (Members from Chan Zuckerberg Initiative, Microsoft, Amazon, Broad Institute, DNAStack, ...) |
Nextflow: | Sequera Labs, Centre for Genomic Regulation; (Funding: Chan Zuckerberg Initiative, Sequera) |
Hello World Workflow (Syntax comparison)
CWL | WDL | Nextflow |
cwlVersion: v1.2
class: CommandLineTool
baseCommand: echo
inputs:
message:
type: string
default: "Hello World"
inputBinding:
position: 1
outputs: [] |
version 1.0
workflow HelloWorld {
call WriteGreeting
}
task WriteGreeting {
command {
echo "Hello World"
}
output {
File output_greeting = stdout()
}
} |
params.str = 'Hello World'
process greeting {
input:
val greeting
output:
stdout
"""
echo ${greeting}
"""
}
workflow {
greeting(params.str)
} |
Calling a Script (Syntax comparison)
CWL | WDL | Nextflow |
cwlVersion: v1.2
class: CommandLineTool
requirements:
InitialWorkDirRequirement:
listing:
- entryname: print.py
entry:
$include: ./print.py
inputs:
message:
type: string
default: Hello World
inputBinding:
position: 1
outputs:
messages:
type: stdout
baseCommand: [python3, ./print.py] |
version 1.1
task greeting {
input {
String the_input
File the_file
}
command {
python ~{the_file} ~{the_input}
}
output {
File result = stdout()
}
runtime {
container: "python:latest"
}
}
workflow HelloWF {
input {
String the_input
File the_file = "print.py"
}
call greeting {
input:
the_input = the_input,
the_file = the_file
}
output { }
} |
process helloworld{
input:
val greeting
output:
file 'helloworld.txt'
script:
""" python3 ${projectDir}/print.py \"${greeting}\" """
}
params.greeting = "Hello World"
workflow {
helloworld(params.greeting)
} |
The local runner "miniWDL" does not support 1.2 as of now! ⛔
As far as i can see you can not use a local file without sending it as parameter... unless it is part of the container. This is true for WDL and Nextflow! Could be a dealbreaker! ⛔
One has to add a config file to make it work with the parameter being set to default (error thrown by miniWDL)⛔
[file_io]
allow_any_input = true
The WDL2CWL Translator works and outputs this file which looks suboptimal but works. But this is a very simple WDL script! There are some test cases in the Translator Repo which are more complicated...
WDL | CWL |
version 1.1
task greeting {
input {
String the_input
File the_file
}
command {
python ~{the_file} ~{the_input}
}
output {
File result = stdout()
}
runtime {
container: "python:latest"
}
}
workflow HelloWF {
input {
String the_input
File the_file = "print.py"
}
call greeting {
input:
the_input = the_input,
the_file = the_file
}
output { }
} |
cwlVersion: v1.2
id: HelloWF
class: Workflow
requirements:
- class: InlineJavascriptRequirement
inputs:
- id: the_input
type: string
- id: the_file
default:
class: File
path: print.py
type: File
steps:
- id: greeting
in:
- id: the_input
source: the_input
- id: the_file
source: the_file
out:
- id: result
run:
class: CommandLineTool
id: greeting
inputs:
- id: the_input
type: string
- id: the_file
type: File
outputs:
- id: result
type: stdout
requirements:
- class: InitialWorkDirRequirement
listing:
- entryname: script.bash
entry: |4
python $(inputs.the_file.path) $(inputs.the_input)
- class: InlineJavascriptRequirement
- class: NetworkAccess
networkAccess: true
hints:
- class: ResourceRequirement
outdirMin: 1024
cwlVersion: v1.2
baseCommand:
- bash
- script.bash
outputs: [] |
I also asked ChatGPT to wrap the code into a CWL CommandLineTool using Docker which in this case worked suprisingly well... Only issue the file "output.txt" is not copied back to the local dir when used like this, i had to change a very small bit
ChatGPT | Works as expected | Original File from above (not using docker) |
cwlVersion: v1.0
class: CommandLineTool
inputs:
input_string:
type: string
inputBinding:
position: 1
outputs: []
stdout: output.txt
baseCommand: python
arguments:
- -c
- |
import sys
print(sys.argv[1])
hints:
DockerRequirement:
dockerPull: python:3.9
|
cwlVersion: v1.0
class: CommandLineTool
inputs:
input_string:
type: string
inputBinding:
position: 1
outputs:
output.txt:
type: stdout
baseCommand: python
arguments:
- -c
- |
import sys
print(sys.argv[1])
hints:
DockerRequirement:
dockerPull: python:3.9 |
cwlVersion: v1.2
class: CommandLineTool
requirements:
InitialWorkDirRequirement:
listing:
- entryname: print.py
entry:
$include: ./print.py
inputs:
message:
type: string
default: Hello World
inputBinding:
position: 1
outputs:
messages:
type: stdout
baseCommand: [python3, ./print.py] |
Target format should still be CWL as this is what is accepted in the community. However due to the Translator for WDL being available one could encourage users to also write WDL. However a single file is produced not being able to mix and match individual CWL-CommandLineTools without manually splitting the file. A Nextflow converter is not available as Nextflow is way more powerful. One could implement it for a subset of features as the typical use case seems to be script execution as there are no widespread tools like in the Bioinformatics fields.
Looking at the r/bioinformatics subreddit it looks like Nextflow is the only one of this three languages that is adopted widely enough to have recent threads about it.
One could still consider GUI Tools like Rabix Composer for CWL (which is deprecated) as this is what HELIPORT seems to uses judging from Screenshots or use templates for these special use cases.
Other Consortia
NFDI4Biodiv however plans to use Nextflow regarding to their latest proposal. This is what is supported by CloWM (developed by NFDI4Microbiota). Whereas DataPLANT uses/wants CWL for ARCs.
Note: There is a requirements document (Requirements on workflow tools) from NFDI4Ing available: https://nfdi4ingscientificworkflowrequirements.readthedocs.io/en/latest/docs/requirements.html#evaluation
Opinionated Pro/Contra List
CWL | WDL | Nextflow | ||
---|---|---|---|---|
General: | ||||
Verbosity | ❌ | ✅ | ✅ | CWL is mega verbose |
Documentation | 🔘 | 🔘 | ✅ | overall ok |
Script Execution in Docker | ✅ | ❌ | ❌ | works best in CWL "requirements" can be cool |
Working with containers | ✅ | ✅ | ✅ | works with all, is default in WDL |
Output into filesystem | ✅ | ❌ | ❌ | CWL outputs req. files, others spam logs into fs |
Speed | 🔘 | 🔘 | 🔘 | all about the same |
Parsable | ✅ | 🔘 | 🔘 | 🔘 = Grammar available, CWL=YAML |
Simplicity: | ||||
official GUI | ❌ | ❌ | ❌ | Galaxy has one |
Ease of first use | ❌ | 🔘 | 🔘 | |
Metadata | ✅ | ❌ | ❌ | Only CWL supports annotation |
Conversion: | ||||
Convertible to CWL | ◽ | ✅ | ❌ | |
Convertible to WDL | 🔘/❌ | ◽ | ❌ | CWL: outdated tool |
Convertible to Nf | 🔘/❌ | ❌ | ◽ | CWL: outdated tool |
Community: | ||||
Size of Community | 🔘/🔘 | 🔘/✅ | ✅/🔘 | by # of Repos and Tools |
Forum/StackOverflow/Reddit | ❌ | ❌ | ✅ | Nextflow most active |
✅ good
🔘 ok
❌ bad
◽ invalid
However simple bash scripts would win most categories 😜
With all the testing i did, CWL still seems to be the right choice for our use cases (most likely: executing an existing script inside a container). CWL might be verbose but gives the users the most level of control.
For Bioinformatics Nextflow and WDL sure are the best choices as all tools are already available. Also its a small set of commandline tools which can easily installed. they are most likely the best pick. But they both expect scripts (like the BL ones) as part of their inputs which makes them overridable and would just have to wrap "Rscript" as a Tool which is ok but the README Graph for example would say "Rscript" for each step then...
Opinions?
superseded by https://github.com/fairagro/m4.4_concept/issues/12