Workflow definitions that include a `workflow.zip` file fail parsing in CromwellWESAdapter
wleepang opened this issue · 1 comments
Describe the Bug
Running a multi-file workflow that includes a workflow.zip
file as part of its definition produces the following error with agc workflow run
:
2022-12-06T17:47:00-08:00 ✘ error="unable to run workflow: 500 Internal Server Error"
Error: an error occurred invoking 'workflow run'
Steps to Reproduce
- Create a multi-file WDL workflow definition
- Add a zip file called
workflow.zip
to the definition folder - Add the workflow to
agc-project.yaml
- Start a Cromwell context
- Run the workflow
Relevant Logs
Context adapter log contains the following:
Tue, 06 Dec 2022 17:19:28 -0800 [ERROR] 2022-12-07T01:19:28.581Z 091259f7-8c63-4b27-9362-72f6f91e9125 Exception on /ga4gh/wes/v1/runs [POST]
Traceback (most recent call last):
File "/var/task/amazon_genomics/wes/adapters/CromwellWESAdapter.py", line 514, in get_workflow_from_s3
props = parse_workflow_zip_file(file, workflow_type)
File "/var/task/amazon_genomics/wes/adapters/CromwellWESAdapter.py", line 555, in parse_workflow_zip_file
zip.extractall(wd)
File "/var/lang/lib/python3.9/zipfile.py", line 1642, in extractall
self._extract_member(zipinfo, path, pwd)
File "/var/lang/lib/python3.9/zipfile.py", line 1697, in _extract_member
shutil.copyfileobj(source, target)
File "/var/lang/lib/python3.9/shutil.py", line 205, in copyfileobj
buf = fsrc_read(length)
File "/var/lang/lib/python3.9/zipfile.py", line 924, in read
data = self._read1(n)
File "/var/lang/lib/python3.9/zipfile.py", line 992, in _read1
data += self._read2(n - len(data))
File "/var/lang/lib/python3.9/zipfile.py", line 1027, in _read2
raise EOFError
EOFError
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/var/task/amazon_genomics/wes/adapters/CromwellWESAdapter.py", line 239, in run_workflow
props = get_workflow_from_s3(workflow_url, tmpdir, workflow_type)
File "/var/task/amazon_genomics/wes/adapters/CromwellWESAdapter.py", line 516, in get_workflow_from_s3
raise RuntimeError(f"{s3_uri} is not a valid workflow.zip file: {e}")
RuntimeError: s3://agc-111122223333-us-west-2/project/orca/userid/pwymingJKP3z/context/cromwellCtx/workflow/broad_gtex/workflow.zip is not a valid workflow.zip file:
Expected Behavior
Workflow definitions that are accompanied by extra modules bundled as a zip file should run regardless of what the module bundle zip is named.
Actual Behavior
Screenshots
Additional Context
Proposed fix:
The workflow definition bundle needs to be extracted to a distinct folder. The following line:
should be replaced with something like:
zip.extractall(path='path/to/tmpdir')
where path/to/tmpdir
is different than wd
which is currently set to the parent folder of the downloaded workflow definition bundle.
Operating System: macOS
AGC Version: 1.5.2
Was AGC setup with a custom bucket: No
Was AGC setup with a custom VPC: No
Doing the following should be a sufficient fix:
# rest of code ...
wd = path.join(path.dirname(file), 'workflow')
with zipfile.ZipFile(file) as zip:
# rest of code ...