Excessive CPU time spent in clock_gettime syscalls under moderate load
jbg opened this issue · 3 comments
Description
Under load testing, we noticed that current JVB master spends most of its CPU time in kernel space, which I found surprising expecting it to be mostly userland and interrupts. Attaching strace
to the busiest thread (thread name: org.jitsi.utils
) gave this report:
# strace -c -p 10018
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
96.62 22.894570 422 54182 clock_gettime
3.35 0.794870 1906 417 206 futex
0.02 0.004398 4398 1 1 restart_syscall
0.00 0.000704 234 3 rt_sigprocmask
0.00 0.000502 71 7 gettid
0.00 0.000355 118 3 rt_sigreturn
------ ----------- ----------- --------- --------- ----------------
100.00 23.695399 433 54613 207 total
It's calling clock_gettime
2286 times per second and spending almost all of its time inside that syscall. This is with just 10 active video senders. There are multiple Global CPU pool
threads also spending most of their time inside clock_gettime
. On this lab system, clock_gettime
system calls account for most of the CPU usage of the JVB and causes it to hit 100% CPU much earlier than it used to as we increase the load.
A separate run with async-profiler
shows:
ns percent samples top
---------- ------- ------- ---
99323573241 82.43% 9573 read_hpet_[k]
Unfortunately we have an issue with JDK debug symbols so can't get the Java part of the relevant stacktraces; they all look like this:
[ 0] read_hpet_[k]
[ 1] ktime_get_ts64_[k]
[ 2] posix_get_monotonic_timespec_[k]
[ 3] __x64_sys_clock_gettime_[k]
[ 4] do_syscall_64_[k]
[ 5] entry_SYSCALL_64_after_hwframe_[k]
[ 6] __vdso_clock_gettime
[ 7] __clock_gettime_2
[ 8] [unknown_Java]
We'll resolve the JDK debug symbols problem and update this issue. We'll also try to bisect this against the JVB repository to find out when the issue began.
This could be caused by a high rate of calls to System.currentTimeMillis()
which uses clock_gettime(CLOCK_REALTIME) internally. A lot has been written in the past about how (relatively) slow this method is, at least on Linux when the clock source is a HPET, and there's a bug report asking for it to be switched to use the much faster clock_gettime(CLOCK_REALTIME_COARSE)
(which still provides sufficient precision for currentTimeMillis
), but it's not had much traction, so the workaround is basically to call it less often or switch to a different clocksource like TSC (which we may experiment with).
It's also plausible that it could be something environmental. Seeing clock_gettime
and futex
at the top of a profile reminded me of the Linux leap second futex bug, but that was fixed years ago. It would be much appreciated if anyone else can try to reproduce these results.
Steps to reproduce
- Put 10 or so people in a conference.
- Observe high System (sy) CPU usage.
- Run
top -H
and observe the PID of the busiest JVB thread. - Run
strace -c -p $PID
and interrupt it after 10-20 seconds. - Observe that
clock_gettime
consumes most of the system CPU time.
Environment details
Xeon E-2278G
Linux 5.14.9
OpenJDK 16.0.2 (tested a few different OpenJDK versions down to 8 and up to 17 with similar results)
Do you have any additional logging (like timeseries) enabled? Can you attach a java profiler?
I can not reproduce this in our environment: sy stays relatively low (~5%) and clock_gettime doesn't show up at all in the strace output (I tried 2 of the CPU pool threads and a few of the IO pool threads).
I can confirm this is somewhat an environmental issue, but it's one that might affect others as well.
Due to either a BIOS bug or a Linux kernel bug, our lab server's TSC clocksource was being marked unstable, and the HPET was being used as the clocksource instead. clock_gettime(CLOCK_REALTIME)
(and thus System.currentTimeMillis()
) is about 20x slower when the clocksource is HPET vs. when it is TSC.
Booting the system with hpet=disable
causes TSC to be selected as clocksource and the problem disappears; CPU usage drops back to normal levels.
So, anyone seeing excessive CPU usage on their JVBs, run:
cat /sys/devices/system/clocksource/clocksource0/current_clocksource
If the output is hpet
, try booting with hpet=disable
added to the kernel command line.
Incidentally, even after the fix, once we push a few hundred participants, clock_gettime
is still pretty significant in the profiling results. If there's any easy way to reduce the amount of System.currentTimeMillis()
and similar timekeeping, it could be a win.
🍿 I found this fascinating to read.