Job scheduling / remote execution pt. 1: Node to Node (Local/Remote) Coordination
zeeshanlakhani opened this issue · 0 comments
zeeshanlakhani commented
Summary
We want Homestar to be able to schedule workflows locally or on remote nodes that it is connected with.
Components
- Distributed Scheduling Policy
- Random
- Local->run right away (as it is today)
- Round-Robin (enhancement)
- Background process for picking up pending workflow, gathering ahead-of-time (static) info, and determining who should run it based on the
Distributed Scheduling Policy
- If the job is being run remotely (i.e. the node that accepted the workflow is acting as the coordinator), determine a peer from
connected peers
and start a request-response for handing off the workflow and its static information - The remote, chosen node can then store that info locally, and pick up the pending workflow on its next tick to run it
- If the job is being run remotely (i.e. the node that accepted the workflow is acting as the coordinator), determine a peer from
- Add DB fields to workflow stored/info around coordinator vs runner and who (peer id) is running and/or coordinating
- Handle
ConnectionClosed
even from the coordinator's perspective - Notifications around how a job is scheduled and if it's complete from the coordinator's perspective