kubernetes/kubernetes

scheduler_perf: define thresholds per test case and set up alerts for results

sanposhiho opened this issue · 8 comments

/kind feature
/sig scheduling

Discussion with sig-scalability: https://kubernetes.slack.com/archives/C09QZTRH7/p1715262959575039

What

We have scheduler-perf, and it'd be great if we could have an alert-ish stuff based on the result.

Based on the discussion with sig-scalability, the easiest way is to change scheduler_perf so that it can fail if the results show degradation, and monitor/alert the failures via testgrid.

"if the results show degradation" > for this, we probably have to define reasonable thresholds per test case.

Why

The current pain point is that perf-dash visualizes it, but no one actually doesn't care much, and consequently we've overlooked degradation several times actually.

This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@kubernetes/sig-scheduling-misc any feedback for the direction proposed above?

+1 from me.

Also, the dashboard doesn't load for me (unless the link is wrong?)

Nvm, it loads :)

/assign

I just assigned it to me so that it remains on my todo list, but it might take some time for me to come back here because of other prioritized tickets. So, if anyone wants, feel free to take over (I can help reviews either way).

Can I help you?

Yes,
/assign @utam0k