Introduce computation stages - validation, download, prepare, ...
unode opened this issue · 1 comments
Currently NGLess uses two stages for execution. A first stage verifies that the script and output files are consistent (equivalent to --validate-only
) and a second stage where computation happens if the first stage finishes successfully.
However, the current implementation performs downloads, indexing and computation during the same (second) stage.
If using the parallel
module, this can lead to jobs waiting on each other for significant amounts of time. This happens during indexing and initialization of internal and external modules, as well as, during downloads, leading to failures or delays due to connectivity problems or slow networks speeds.
For example, mapping to hg19
only downloads and indexes the files when the map()
step is reached for the first time.
This limitation often leads to workflows that follow a run one sample first and if it finishes run all others
approach.
If implementing a staged execution, an ngless workflow could look like:
# (run once) Ensure ngless is correctly installed
ngless --check-install
# (run once) Check that the script is valid and inputs/outputs are as expected
ngless --validate-only script.ngl
# (run once/multiple) Download and index all dependencies (references, resources from internal modules, initialization of external modules, indexing, etc...)
ngless --ensure-dependencies script.ngl
# (run once/multiple) Interpret the script, possibly in parallel
ngless script.ngl
An advantage of --ensure-dependencies
is that resources could be downloaded, indexed, ... in parallel, something which currently happens sequentially.
Additionally, execution of script.ngl
would have predictable behavior for a user regardless of being the first time the command is being executed.
This issue is also in line with #71 which proposes a setup
phase for external modules. Such phase would also run during --ensure-dependencies
.
There is a hacky way to do this, which is to do --subsample
. Not a great solution in terms of UX, but it works for now.