kernel: Implement core event counters
htejun opened this issue · 5 comments
The kernel sometimes has to take actions that override the BPF scheduler's decisions. Non-exhaustive list:
- If
ops.select_cpu()
returns a CPU which can't be used by the task, the core scheduler code silently picks a fallback CPU. - When dispatching to a local DSQ, the CPU may have gone offline in the meantime. In this case, the task is bounced to the global DSQ.
In addition, there are common events that can be interesting but not easily visible:
- If
SCX_OPS_ENQ_LAST
is not set, the number of times that a task continued to run because there were no other tasks on the CPU. - Similar for
SCX_OPS_ENQ_EXITING
. - Similar for bypass mode.
Statistics like the above can be collected and made accessible to the BPF scheduler via kfunc interface for visibility and sanity checks. It may also make sense to implement a threshold mechanism so that e.g. the BPF scheduler is picking an invalid CPU for >2% of the time for some duration, trigger ops error and so on.
The first patchset was posted: https://lore.kernel.org/lkml/20250116151543.80163-1-changwoo@igalia.com/
The following seven events were merged.
In addition, I will add one more useful event, which count how many times the default time slice has been set, and add a filesystem interface to peek the event counters from userspace easily.
One more event was added:
sysfs and tracepointe were added.