kernel: Implement core event counters

Question

kernel: Implement core event counters

htejun opened this issue 5 months ago · 5 comments

The kernel sometimes has to take actions that override the BPF scheduler's decisions. Non-exhaustive list:

If ops.select_cpu() returns a CPU which can't be used by the task, the core scheduler code silently picks a fallback CPU.
When dispatching to a local DSQ, the CPU may have gone offline in the meantime. In this case, the task is bounced to the global DSQ.

In addition, there are common events that can be interesting but not easily visible:

If SCX_OPS_ENQ_LAST is not set, the number of times that a task continued to run because there were no other tasks on the CPU.
Similar for SCX_OPS_ENQ_EXITING.
Similar for bypass mode.

Statistics like the above can be collected and made accessible to the BPF scheduler via kfunc interface for visibility and sanity checks. It may also make sense to implement a threshold mechanism so that e.g. the BPF scheduler is picking an invalid CPU for >2% of the time for some duration, trigger ops error and so on.

Answer 1 · 2025-01-17T00:53:34.000Z

The first patchset was posted: https://lore.kernel.org/lkml/20250116151543.80163-1-changwoo@igalia.com/

Answer 2 · 2025-02-05T07:44:06.000Z

The following seven events were merged.

Answer 3 · 2025-02-05T07:48:50.000Z

In addition, I will add one more useful event, which count how many times the default time slice has been set, and add a filesystem interface to peek the event counters from userspace easily.

Answer 4 · 2025-02-08T00:03:01.000Z

One more event was added:

SCX_EV_ENQ_SLICE_DFL

Answer 5 · 2025-03-05T01:13:08.000Z

sysfs and tracepointe were added.