ipvm-wg/homestar

Job scheduling / remote execution pt. 1: Node to Node (Local/Remote) Coordination

zeeshanlakhani opened this issue · 0 comments

Summary

We want Homestar to be able to schedule workflows locally or on remote nodes that it is connected with.

Components

  • Distributed Scheduling Policy
    • Random
    • Local->run right away (as it is today)
    • Round-Robin (enhancement)
  • Background process for picking up pending workflow, gathering ahead-of-time (static) info, and determining who should run it based on the Distributed Scheduling Policy
    • If the job is being run remotely (i.e. the node that accepted the workflow is acting as the coordinator), determine a peer from connected peers and start a request-response for handing off the workflow and its static information
    • The remote, chosen node can then store that info locally, and pick up the pending workflow on its next tick to run it
  • Add DB fields to workflow stored/info around coordinator vs runner and who (peer id) is running and/or coordinating
  • Handle ConnectionClosed even from the coordinator's perspective
  • Notifications around how a job is scheduled and if it's complete from the coordinator's perspective