common-workflow-language/cwljava

Enum types are not parsed correctly

jdidion opened this issue · 6 comments

After packing https://github.com/common-workflow-language/cwl-v1.2/blob/1.2.1_proposed/tests/anon_enum_inside_array_inside_schemadef.cwl using cwlpack and then parsing with cwljava, the enum has symbols like "file:/Users/jdidion/projects/cwlScala/target/test-classes/CommandLineTools/conformance/#anon_enum_inside_array_inside_schemadef.cwl/first/user_type_2/species/homo_sapiens" rather than just "homo_sapiens". Packed workflow below.

{
    "cwlVersion": "v1.2",
    "class": "CommandLineTool",
    "requirements": [
        {
            "class": "InlineJavascriptRequirement"
        }
    ],
    "inputs": [
        {
            "type": {
                "name": "user_type_2",
                "type": "record",
                "fields": [
                    {
                        "type": [
                            "null",
                            {
                                "type": "enum",
                                "symbols": [
                                    "homo_sapiens",
                                    "mus_musculus"
                                ]
                            }
                        ],
                        "name": "species"
                    },
                    {
                        "type": [
                            "null",
                            {
                                "type": "enum",
                                "symbols": [
                                    "GRCh37",
                                    "GRCh38",
                                    "GRCm38"
                                ]
                            }
                        ],
                        "name": "ncbi_build"
                    }
                ]
            },
            "id": "first"
        }
    ],
    "baseCommand": "echo",
    "arguments": [
        {
            "prefix": "species",
            "valueFrom": "$(inputs.first.species)"
        },
        {
            "prefix": "ncbi_build",
            "valueFrom": "$(inputs.first.ncbi_build)"
        }
    ],
    "outputs": [
        {
            "type": "stdout",
            "id": "result"
        }
    ],
    "id": "anon_enum_inside_array_inside_schemadef.cwl"
}
mr-c commented

The Python version has the same issue, so I'll look into the schema itself next

from pathlib import Path
from typing import List
from ruamel import yaml

from cwl_utils.parser import load_document_by_uri
import cwl_utils.parser.cwl_v1_2 as cwl

cwl_file = Path("/home/michael/cwljava/src/test/resources/org/w3id/cwl/cwl1_2/utils/valid_anon_enum_inside_array_inside_schemadef.cwl")

cwl_obj = load_document_by_uri(cwl_file)

tool: cwl.CommandLineTool = cwl_obj
schemaDef: cwl.SchemaDefRequirement = tool.requirements[0]
recordDef: cwl.CommandInputRecordSchema = schemaDef.types[0]
recordID: str = recordDef.name
print(f"Record name: {recordID}.")
ncbi_build_field: cwl.CommandInputRecordField = recordDef.fields[0]
#print(f"ncbi_build_field: {ncbi_build_field.save()}")
ncbi_build_enum: cwl.CommandInputEnumSchema = ncbi_build_field.type[1]
ncbi_build_symbols: List[str] = ncbi_build_enum.symbols
print(f"ncbi_build_symbols: {ncbi_build_symbols}.")
$ python cwljava_70.py 
Record name: file:///home/michael/cwljava/src/test/resources/org/w3id/cwl/cwl1_2/utils/valid_anon_enum_inside_array_inside_schemadef.cwl#vcf2maf_params.
ncbi_build_symbols: ['file:///home/michael/cwljava/src/test/resources/org/w3id/cwl/cwl1_2/utils/valid_anon_enum_inside_array_inside_schemadef.cwl#vcf2maf_params/ncbi_build/GRCh37', 'file:///home/michael/cwljava/src/test/resources/org/w3id/cwl/cwl1_2/utils/valid_anon_enum_inside_array_inside_schemadef.cwl#vcf2maf_params/ncbi_build/GRCh38', 'file:///home/michael/cwljava/src/test/resources/org/w3id/cwl/cwl1_2/utils/valid_anon_enum_inside_array_inside_schemadef.cwl#vcf2maf_params/ncbi_build/GRCm38']
mr-c commented

See common-workflow-language/schema_salad#506 for a deep dive.

The best I really can offer in the short term is to add a helper function in a utility Java class to clean up the symbol names (or you could do so from Scala)

Hi @mr-c I wonder if the implementation of Enum as string in commit 8af4104 should be included in the current cwljava?

The latest cwl-v1.2.0 @61ddfed9f384 will still parse the enum symbols as full URIs as John mentioned. And it seems that it is still using the URI loader.

git diff 8af4104d05fe988c3aff8c3a0ddd67f7e4eba77b 61ddfed9f384aa6d27fdf0c0d7aefd474c1de440 src/main/java/org/w3id/cwl/cwl1_2/EnumSchemaImpl.java

gives

@@ -88,7 +88,7 @@ public class EnumSchemaImpl extends SavableImpl implements EnumSchema {
     try {
       symbols =
           LoaderInstances
-              .array_of_StringInstance
+              .uri_array_of_StringInstance_True_False_None
               .loadField(__doc.get("symbols"), __baseUri, __loadingOptions);
     } catch (ValidationException e) {
       symbols = null; // won't be used but prevents compiler from complaining.

Let me know what I shall do to make it work. Thanks!

Hi @mr-c, is it possible to get an update on this? It'll be greatly appreciated if you can merge the linked fix as it'd unblock us, thanks!

mr-c commented

Hello @YuxinShi0423 and @commandlinegirl

The bulk of cwljava is generated by running schema-salad-tool --codegen java (from https://github.com/common-workflow-language/schema_salad/) against the in development version of CWL v1.2.1 at https://github.com/common-workflow-language/cwl-v1.2/blob/1.2.1_proposed/CommonWorkflowLanguage.yml (which is semantically the same as CWL v1.2 but with clarifications to the text and one change to the schema that encode the allowed values of the class field more explicitly.)

In common-workflow-language/schema_salad#506 is a long discussion about how enums are to be interpreted in CWL, and at this time we can make no changes to the schema as it would break backwards compatibility in this regard.

87bc4b2 is derived from a schema change proposal that was rejected, so it can not be merged.

Instead, for DNAnexus I recommend processing enum values to get the "short name" following the rules at https://www.commonwl.org/v1.1/SchemaSalad.html#Short_names as discussed in common-workflow-language/schema_salad#511