[FEAT] Improve interaction between EOWorkflow and parallelization/serialization
zigaLuksic opened this issue · 1 comments
What is the problem? Please describe.
To use workflow.execute
the user must provide a dictionary, where the keys are tasks and values are execution arguments for said tasks. During execution, the arguments for a task are retrieved by using the task as a lookup key.
This is problematic when the workflow and execution arguments are serialized, for example when trying to parallelize with the Ray library.
@ray.remote
def run_workflow(workflow, exec_args):
workflow.execute(exec_args)
Because the objects are pickled, the tasks in workflow
no longer directly match those in exec_args
, so the dictionary lookup fails.
Here's the solution
Tasks are already assigned a unique id upon initialization. This unique id is also used throughout the workflow class to identify tasks. It should therefore be reasonable to retrieve execution arguments by looking up unique ids instead of objects themselves. This can probably be done within the workflow class without changing its interface.
Improvements and ray cluster support is now available on the develop-v1.0
branch.