snuspl/harmony

Scheduler for running multiple jobs resource-efficiently

Closed this issue · 0 comments

Current JobServer runs jobs with partitioned resources.

However, we can run jobs more efficiently by sharing resource across jobs.
For this, we need to coordinate jobs run harmoniously without contention, maximizing resource utilization.

In detail, we need to do following things:

  • change worker trainer task to be controllable with more fine-grained manner.
  • introduce a component to control trainer tasks.