High CPU usage on system time change
jetomit opened this issue · 7 comments
Since Guix upgraded to guile-fibers 1.3.1, shepherd hangs shortly after boot on systems without a RTC. I believe the problem comes from using get-internal-real-time
in the guile-fibers timer wheel implementation. After NTP corrects the system time, this function returns a much larger value, and the CPU load (for one core) goes to 100%.
Profiling suggests the process spends the CPU time in timer-wheel-advance!
, so I imagine it is trying to tick through a five-year time diff. I tried increasing the system time manually by N days, which causes shepherd to be unresponsive (e.g. to herd status
) for about N×5 seconds. I observed similar behavior with the example from guile-fibers readme.
Replacing all instances of (get-internal-real-time)
with (clock-gettime 1)
in guile-fibers, and reconfiguring the system with the patched package, fixes this problem. I think using a monotonic clock makes sense, but there is probably a cleaner / more portable way to do it.
Thanks!
Hi @jetomit!
Using CLOCK_MONOTONIC
as you suggest seemed like the right choice to me so I started working on it. However, the API of (fibers timers)
as well as schedule-task-at-time
expect "internal time units"; changing timer-wheel
to use CLOCK_MONOTONIC
would affect those interfaces similarly, which is not acceptable.
Instead we should probably change timer-wheel-advance!
to cope with large gaps.
@wingo, WDYT?
Thanks!
@jetomit Here's a proposed workaround on the Guix side: https://issues.guix.gnu.org/64966
Here's a proposed workaround on the Guix side: https://issues.guix.gnu.org/64966
This would work for aarch64, but I also encounter this issue on armhf and x86_64 systems. This happens whenever system time is pushed forward by a significant amount (a day or more), either by ntpd or manually.
As I understand it, guile’s internal-time-units
only depends on the platform and is the same for all clock types. The bigger problem with using CLOCK_MONOTONIC
might be that it doesn’t count time the system is suspended, which would probably break stuff.
Another report of shepherd
spinning once system time has changed: https://issues.guix.gnu.org/66684
@wingo Hello! Did you have a chance to look into that? I'd be happy to try and implement any suggestions you might have (I'd love to do that before Shepherd 1.0 is out).
Took a look at it but it requires a bit of concentration to not introduce bugs :) Do have a look if you like!
Just posting for the records another example "in the wild" of someone working around this issue. https://issues.guix.gnu.org/70892#3
Thanks for all your hard work! 😄