Nested accounts missing from fairshare
Xaraxia opened this issue · 2 comments
Hi,
We have a nested account arrangement, and those accounts aren't properly being reported on.
I dug into the code, and the command is:
$ sshare -n -P -o account,fairshare
root|0.500000
top_1|0.999998
nested_1_1|0.999998
nested_1_2|1.000000
nested_1_2_1|1.000000
top_2|0.481723
nested_2_1|0.858038
nested_2_2|0.961831
However when I get the metrics, I only get root, top_1 and top_2.
'root' isn't useful. top accounts are useful as an aggregate, but I'd also like to see the nested accounts.
Ideally, we would have "slurm_account_fairshare" as it is, and also offer "slurm_subaccount_fairshare" so that I could graph both.
Looks like ParseFairShareMetrics() is the culprit, throwing away anything that starts with more than one space.
if ! strings.HasPrefix(line," ") {
I can see the argument for doing it, hence my proposal to gather two sets of metrics.
This is what is actually coming out of the exporter:
slurm_account_fairshare{account="top_1"} 0.999998
slurm_account_fairshare{account="root"} 1
slurm_account_fairshare{account="top_2"} 0.481723
So perhaps the right answer is to do
slurm_account_fairshare{account="root"} 1
slurm_account_fairshare{account="top_1", parent_account="root", account_depth="1"} 0.999998
slurm_account_fairshare{account="nested_1_2", parent_account="top_1", account_depth="2"} 1.000000
slurm_account_fairshare{account="nested_1_2_1", parent_account="nested_1_2", account_depth="3"} 1.000000
I'm happy to cut some code to do this if you can give me some recommendations.
Tangentally related, but noting here in case anyone journeys past here looking for it as I did. I was looking into something similar, where fairshare metrics were missing from all accounts. When the fair tree fairshare algorithm is used (changed in slurm 19.05+ to be the default), sshare makes no attempt to calculate a fairshare metric for anything other than users directly. For accounts, a (double)NO_VAL64
is hardcoded, and this appears to be rendered as a blank: https://github.com/SchedMD/slurm/blob/master/src/sshare/process.c#L261
This manifests as the exported reporting 0 for all accounts. We considered patching the exporter to report back LevelFS instead, which is produced by sshare for accounts, but not sure how best to deal with infinity.