aws/amazon-genomics-cli

AGC run fails when empty strings in inputs.json file

biofilos opened this issue · 4 comments

Describe the Bug

If the inptus.json of a workflow contains empty strings

{
    "wf.something": 1,
    "wf.thisWillCrash": ""
}

When I run agc workflow run, the command will fail complaining that the directory where the input.json file is located is a directory. For some reason, the command crops out the file name of the inputs.json and continues processing the directory as if it was the inputs.json.

Although this bug should be fixed (or explained this behavior in the documentation), a work-around is to avoid empty strings in the input.json, and use empty strings as default values in the wdl file itself, or use non-empty values in the inputs.json, and catch them with logic in the wdl file.

Steps to Reproduce

  1. Include empty strings as value of a input.json
  2. Run agc workflow run wf_name --inputsFile input.json

Relevant Logs

When running the workflow with empty strings in input.json

2022-10-27T17:06:29+08:00 𝒊  Running workflow. Workflow name: 'variants', InputsFile: 'inputs/variants.inputs.json', OptionFile: '', Context: 'ctx1'
2022-10-27T17:06:30+08:00 ✘   error="unable to run workflow: unable to sync s3://agc-xxxxxx-ap-xxx-1/project/testProject/userid/userSDFHG/data: upload multipart failed, upload id: uPenv1AiRi0I55mR1X48ppYXYmHNy.t_0uWPDkIC62xkPlHzHGchJvBcA9Vh7isnJqpLZvw6.N8XC2OHQ2_zXtfQc1cuxc_0_BE2zgNQ_0f2u1dkkUMm_czBd86pZPDQ9gF_USm7D69KGdNgO6GUtg--, cause: operation error S3: UploadPart, failed to compute payload hash: failed to compute payload hash, read /home/user/Documents/projects/scalable-workflows/aws/inputs/: is a directory"
Error: an error occurred invoking 'workflow run'
with variables: {WorkflowName:variants Arguments:inputs/variants.inputs.json OptionFile: ContextName:ctx1}
caused by: unable to run workflow: unable to sync s3://agc-662002918436-ap-southeast-1/project/testProject/userid/jfortiz4vr6sV/data: upload multipart failed, upload id: uPenv1AiRi0I55mR1X48ppYXYmHNy.t_0uWPDkIC62xkPlHzHGchJvBcA9Vh7isnJqpLZvw6.N8XC2OHQ2_zXtfQc1cuxc_0_BE2zgNQ_0f2u1dkkUMm_czBd86pZPDQ9gF_USm7D69KGdNgO6GUtg--, cause: operation error S3: UploadPart, failed to compute payload hash: failed to compute payload hash, read /home/user/Documents/projects/scalable-workflows/aws/inputs/: is a directory

Expected Behavior

Workflow runs

Actual Behavior

AGC fails claiming that it can not compute the payload hash of the directory where the input.json file is

Screenshots

Additional Context

Operating System: Ubuntu 22.10
AGC Version: 1.5.1
Was AGC setup with a custom bucket: No
Was AGC setup with a custom VPC: No

Confirmed the bug by adding a "wf.thisWillCrash": "" line to this inputs.json.

We will add fixing this to our backlog.

We could certainly provide a more informative error message.

Is there a useful reason to provide an empty input? Wouldn't it be better to make the input optional in the WDL?

Yes.
We rely on annotation from our users, and we validate all input json files, so all fields for a specific workflow should be present. Adding that level of validation inside the wdl itself just increase the complexity of the workflow unnecessarily.

If this is a non-fixable bug, it would be good to document that no empty fields are allowed in the wdl file, so we can act accordingly

It should be fixable. Was mainly wondering if we should allow empty values or just emit a better error. Seems like there's a case for empty values so we should allow them.