[batch] Create Python Ray Runner based on Beam Portability API
Opened this issue · 0 comments
pdames commented
To support Beam pipeline graph construction, optimization, and submission for execution, we need to build a Ray Runner class based on Beam Portability API's FnApiRunner that supports both local (i.e. single-node) and remote (i.e. multi-node) pipeline execution.
At a high-level, its run_pipeline
method should take a Beam pipeline DAG to run as input, and produce a PipelineResult
that can describe/manage pipeline execution state as output. The Ray Runner Beam Components Doc provides additional details about the FnApiRunner's end-to-end workflow, and how it relates to Ray.
This runner will also depend on successful implementation of the following components:
- Ray Work Item Scheduler: The Ray Work Item Scheduler takes work items from the batch FnApiRunner's topological scheduler as input, and submits them for execution as Ray tasks.
- Ray Pipeline State Manager: The Ray Pipeline State Manager is a central service that consolidates the execution state of all scheduled pipeline work items in Ray's object store.