sched-ext/scx

kernel: Implement core event counters

htejun opened this issue · 5 comments

The kernel sometimes has to take actions that override the BPF scheduler's decisions. Non-exhaustive list:

  • If ops.select_cpu() returns a CPU which can't be used by the task, the core scheduler code silently picks a fallback CPU.
  • When dispatching to a local DSQ, the CPU may have gone offline in the meantime. In this case, the task is bounced to the global DSQ.

In addition, there are common events that can be interesting but not easily visible:

  • If SCX_OPS_ENQ_LAST is not set, the number of times that a task continued to run because there were no other tasks on the CPU.
  • Similar for SCX_OPS_ENQ_EXITING.
  • Similar for bypass mode.

Statistics like the above can be collected and made accessible to the BPF scheduler via kfunc interface for visibility and sanity checks. It may also make sense to implement a threshold mechanism so that e.g. the BPF scheduler is picking an invalid CPU for >2% of the time for some duration, trigger ops error and so on.

In addition, I will add one more useful event, which count how many times the default time slice has been set, and add a filesystem interface to peek the event counters from userspace easily.

One more event was added:

sysfs and tracepointe were added.