sentinel-hub/eo-learn

[FEAT] Improve interaction between EOWorkflow and parallelization/serialization

zigaLuksic opened this issue · 1 comments

What is the problem? Please describe.

To use workflow.execute the user must provide a dictionary, where the keys are tasks and values are execution arguments for said tasks. During execution, the arguments for a task are retrieved by using the task as a lookup key.

This is problematic when the workflow and execution arguments are serialized, for example when trying to parallelize with the Ray library.

@ray.remote
def run_workflow(workflow, exec_args):
    workflow.execute(exec_args)

Because the objects are pickled, the tasks in workflow no longer directly match those in exec_args, so the dictionary lookup fails.

Here's the solution

Tasks are already assigned a unique id upon initialization. This unique id is also used throughout the workflow class to identify tasks. It should therefore be reasonable to retrieve execution arguments by looking up unique ids instead of objects themselves. This can probably be done within the workflow class without changing its interface.

Improvements and ray cluster support is now available on the develop-v1.0 branch.