aws/amazon-genomics-cli

Workflow definitions that include a `workflow.zip` file fail parsing in CromwellWESAdapter

wleepang opened this issue · 1 comments

Describe the Bug

Running a multi-file workflow that includes a workflow.zip file as part of its definition produces the following error with agc workflow run:

2022-12-06T17:47:00-08:00 ✘   error="unable to run workflow: 500 Internal Server Error"
Error: an error occurred invoking 'workflow run'

Steps to Reproduce

  1. Create a multi-file WDL workflow definition
  2. Add a zip file called workflow.zip to the definition folder
  3. Add the workflow to agc-project.yaml
  4. Start a Cromwell context
  5. Run the workflow

Relevant Logs

Context adapter log contains the following:

Tue, 06 Dec 2022 17:19:28 -0800 [ERROR] 2022-12-07T01:19:28.581Z        091259f7-8c63-4b27-9362-72f6f91e9125    Exception on /ga4gh/wes/v1/runs [POST]
Traceback (most recent call last):
  File "/var/task/amazon_genomics/wes/adapters/CromwellWESAdapter.py", line 514, in get_workflow_from_s3
    props = parse_workflow_zip_file(file, workflow_type)
  File "/var/task/amazon_genomics/wes/adapters/CromwellWESAdapter.py", line 555, in parse_workflow_zip_file
    zip.extractall(wd)
  File "/var/lang/lib/python3.9/zipfile.py", line 1642, in extractall
    self._extract_member(zipinfo, path, pwd)
  File "/var/lang/lib/python3.9/zipfile.py", line 1697, in _extract_member
    shutil.copyfileobj(source, target)
  File "/var/lang/lib/python3.9/shutil.py", line 205, in copyfileobj
    buf = fsrc_read(length)
  File "/var/lang/lib/python3.9/zipfile.py", line 924, in read
    data = self._read1(n)
  File "/var/lang/lib/python3.9/zipfile.py", line 992, in _read1
    data += self._read2(n - len(data))
  File "/var/lang/lib/python3.9/zipfile.py", line 1027, in _read2
    raise EOFError
EOFError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/var/task/amazon_genomics/wes/adapters/CromwellWESAdapter.py", line 239, in run_workflow
    props = get_workflow_from_s3(workflow_url, tmpdir, workflow_type)
  File "/var/task/amazon_genomics/wes/adapters/CromwellWESAdapter.py", line 516, in get_workflow_from_s3
    raise RuntimeError(f"{s3_uri} is not a valid workflow.zip file: {e}")
RuntimeError: s3://agc-111122223333-us-west-2/project/orca/userid/pwymingJKP3z/context/cromwellCtx/workflow/broad_gtex/workflow.zip is not a valid workflow.zip file: 

Expected Behavior

Workflow definitions that are accompanied by extra modules bundled as a zip file should run regardless of what the module bundle zip is named.

Actual Behavior

Screenshots

Additional Context

Proposed fix:

The workflow definition bundle needs to be extracted to a distinct folder. The following line:

should be replaced with something like:

zip.extractall(path='path/to/tmpdir')

where path/to/tmpdir is different than wd which is currently set to the parent folder of the downloaded workflow definition bundle.

Operating System: macOS
AGC Version: 1.5.2
Was AGC setup with a custom bucket: No
Was AGC setup with a custom VPC: No

Doing the following should be a sufficient fix:

# rest of code ...
wd = path.join(path.dirname(file), 'workflow')
with zipfile.ZipFile(file) as zip:
    # rest of code ...