Some WDL Maps are not handled
dinithiW opened this issue · 8 comments
Currently, the WDL input compound type Map[]
is not handled by the translator. (https://github.com/openwdl/wdl/blob/main/versions/1.1/SPEC.md#mapp-y)
A workflow using Maps is given in this workflow (https://github.com/biowdl/tasks/blob/bc1bacf11498d2d30b85591cfccdcf71ef0966a5/star.wdl#L144)
edit by @mr-c , we now correctly translate non-input variables of type Map
as demonstrated by #189 but I can't find any examples of real-world WDL workflows with inputs of type Map
; currently https://github.com/broadinstitute/warp/blob/b09880a71e3d3e42fa4b544d03aea23c0a246efc/pipelines/broad/dna_seq/germline/joint_genotyping/by_chromosome/JointGenotypingByChromosomePartOne.wdl#L46 does not translate, but that is probably more to due with the use of the Set
type
I am looking at this, and I am not exactly sure what I should do. Is it just to create a translation from WDL to CWL like here or am I missing something obvious?
Oh I should create a class for map, is that it?
@RaOneG Pardon my delay
In this instance,
Map[String, String] samOutputNames = {"BAM SortedByCoordinate": "sortedByCoord.out.bam"}
is not in the WDLinputs
section, it is a private declaration.
So to convert this example we would need to figure out how to represent the only other place samOutputNames
is used. Here it is in the outputs
section, to calculate the name of the output file:
File bamFile = outFileNamePrefix + "Aligned." + samOutputNames[outSAMtype]
So the strategy for Map
used outside the inputs
section (and not used as the type of one of the outputs
) is to
- Convert that private declaration to Javascript. In this example that would be
var samOutputNames = Map([ ["BAM SortedByCoordinate", "sortedByCoord.out.bam"] ];
- Then continue as normal.
For our example, that would be producing
${
var samOutputNames = new Map([ ["BAM SortedByCoordinate", "sortedByCoord.out.bam"] ]);
return inputs.outFileNamePrefix + "Aligned." + samOutputNames.get(inputs.outSAMtype);
}
for the output_glob
at
wdl-cwl-translator/wdl2cwl/main.py
Line 744 in 3e287fb
However, the Javascript Map
type is not part of ECMAScript 5.1, which is the version of Javascript that CWL v1.x Expressions use
While in this instance we can rely on the Map
like qualities of object
in Javascript because all the keys in samOutputNames
are strings, there may be regular uses of WDL's Map
type that use other types of keys, like numbers.
So for now we can use a Javascript Object instead:
${
var samOutputNames = {"BAM SortedByCoordinate": "sortedByCoord.out.bam"};`
return inputs.outFileNamePrefix + "Aligned." + samOutputNames[inputs.outSAMtype];
}
But eventually we will have to find another solution like adapting this polyfill that brings the ECMAScript 6 Map
to ECMAscript 5: https://github.com/zloirock/core-js/blob/v2.6.12/modules/es6.map.js
A simpler polyfill for ES 6's Map
is https://github.com/anonyco/Javascript-Fast-Light-Map-WeakMap-Set-And-WeakSet-JS-Polyfill/blob/07595a2acb8df921c6481024218200c441f87621/mapPolyfill.src.js (adapted for our use at https://github.com/common-workflow-lab/wdl-cwl-translator/blob/32e9c5dd463c113e63f056f933201a1a7a44567b/map.js)
Hi @mr-c,
I am sorry but this is a little difficult for me to understand in addition to I have no prior knowledge of Javascript. Is there something I can learn/read to get some more in-depth knowledge about what you mentioned above apart from the Javascript part?
Hello @RaOneG
A very basic level of JavaScript understanding will be necessary. I personally know very little Javascript, instead I constantly do web searches to remind myself of the correct syntax. I literally Google searched for "if then else javascript" the other week!
As for my comments on this issue, you can ignore everything I wrote about "polyfill"s and ECMAScript Map
.
I asked in the OpenWDL slack for a real-world example of a Map
input https://openwdl.slack.com/archives/CT0QEK1V0/p1646937620784419 ; pending a real-world usage or user request, I think this is now a low-priority issue
@dpark01 writes
so from a high level, the reason I find myself using
read_json
orMap
structures in general in WDL is when I'm
- scattering on X
- need to track/coordinate metadata on a per-X basis, either post-gather or even inside the scatter block
this particular example task is an interesting one where I take a bunch of same-length Array inputs and basically zip them together and then deduplicate them based on one of the strings: that is, it's assumed that the biosamples input array may contain non-unique values, so the output of this task produces an output Array of unique biosamples and then a bunch of Maps that let the workflow code query metadata inside a scatter block for each biosample.
in an unrelated example, here's a workflow that calls other tasks that emit Map outputs and then looks up values in that Map in the middle of a scatter block in order to know what to do differently with each sample: