andrewrk/poop

look into CPU shielding

Opened this issue · 2 comments

https://manpages.ubuntu.com/manpages/trusty/man1/cset-shield.1.html

I just learned about this today (thanks @Verdagon!). Maybe whatever syscalls it is using under the hood could be a nice way to make poop obtain less noisy measurements.

Here is an In-depth manual from 8 days ago at https://documentation.suse.com/sle-rt/12-SP5/single-html/SLE-RT-shielding/, upstream https://github.com/lpechacek/cpuset with issues.

Please note, that cset-shield is written in Python and GPLv2 and is a few thousand LOC.

The manual also describes some not nice quirks

Note. There is a minor chance that a task forks during move and its child remains in the root cpuset. 

I think the author did not want to deal with strace and/or pid 1/process group tracking, which is another level of complexity and inefficient in Python.

Afaiu, there are 4 things needed

    1. setup of shield configuration (creation of shield and moving tasks into it)
    1. running process with options
    1. teardown of shield configuration / reset of system state (moving tasks out and delete shield)
    1. Detection, if there is a current cpu shield running (including info how to clean it up manually)

I think a partial reimplementation in Zig should start with the quirk (process movement handling forks). I think I'll make a writeup of the underlying problem soon.

I do not yet understand what time guarantees the Kernel provides regarding when reads and writes to the pseudo-file system being applied, so I asked the author of the tool with polite hints how to fix some Python stuff: SUSE/cpuset#46

I hope there are callbacks or there is anything from the Kernel, because otherwise we would need to do dirty waiting and "hope that it has applied" leaving the door open to spurious failures. Even, if tracking clone would be handled, for example via strace.

Overall overview: https://man7.org/linux/man-pages/man7/cpuset.7.html