armadaproject/armada

Better fairness metrics

Opened this issue · 0 comments

(definition: blocked user = user that has jobs queued that could fit onto an empty farm but weren't put on in the most recent scheduling round)
(1) For "resource" in [CPU, memory, disk, sum-of-fractions-of-capacity]: What fraction of resource does each blocked user have, relative to priority_thisuser / sum(priority among blocked users) (where large priority = more jobs). "Perfect fairness" would be that each user's line plot hovers around 1 (with each user's line cutting out when they're not currently blocked)
(2) For "resource" in [CPU, memory, disk, sum-of-fractions-of-capacity]: What's the smallest job (including armada-id) that didn't get on in the last round of scheduling for each blocked user (for gangs, maybe just sum across jobs-in-gang?). If previous plot is not "perfectly fair", people would probably be less likely to complain if they asked for (32core, 500GBram) jobs while top user asked for (1core, 16GB) jobs (although I personally still think the scheduler should be improved to try harder to give big jobs equal share, it's at least understandable that that's a non-trivial problem)
(3) For "resource" in [CPU, memory, disk, sum-of-fractions-of-capacity]: Plot of cumulative (from midnight each day) preempted resource-times-runtimeAtPointOfPreemption; by-user-and-stacked-over-all-users