Large number of clock_gettime syscalls
hodgesds opened this issue · 7 comments
I noticed when running below on live mode a few interesting things. First there are roughly 3-4 clock_gettime
syscalls per epoll_wait
.
strace -f -p `pgrep below`
...
[pid 10545] epoll_wait(7, [], 3, 0) = 0
[pid 10545] clock_gettime(CLOCK_MONOTONIC, {tv_sec=1030, tv_nsec=824761942}) = 0
[pid 10545] clock_gettime(CLOCK_MONOTONIC, {tv_sec=1030, tv_nsec=824806781}) = 0
[pid 10545] clock_gettime(CLOCK_MONOTONIC, {tv_sec=1030, tv_nsec=824850152}) = 0
[pid 10545] clock_gettime(CLOCK_MONOTONIC, {tv_sec=1030, tv_nsec=824894012}) = 0
[pid 10545] epoll_wait(7, [], 3, 0) = 0
[pid 10545] clock_gettime(CLOCK_MONOTONIC, {tv_sec=1030, tv_nsec=824976495}) = 0
[pid 10545] clock_gettime(CLOCK_MONOTONIC, {tv_sec=1030, tv_nsec=825019028}) = 0
[pid 10545] clock_gettime(CLOCK_MONOTONIC, {tv_sec=1030, tv_nsec=825062260}) = 0
[pid 10545] clock_gettime(CLOCK_MONOTONIC, {tv_sec=1030, tv_nsec=825104165}) = 0
[pid 10545] epoll_wait(7, [], 3, 0) = 0
A quick and dirty bpftrace
estimate of syscalls/sec, shows ~20k syscalls/sec:
bpftrace -e 'tracepoint:raw_syscalls:sys_enter /comm == "below"/ { @syscalls = count(); }
interval:s:1 { print(@syscalls); clear(@syscalls); }'
Attaching 2 probes...
@syscalls: 20399
@syscalls: 19499
@syscalls: 18809
@syscalls: 19378
@syscalls: 27254
@syscalls: 20061
@syscalls: 19144
@syscalls: 18824
Compared to clock_gettime
syscalls (~15k syscalls/sec):
bpftrace -e 'tracepoint:syscalls:sys_enter_clock_gettime /comm == "below"/ { @syscalls = count(); }
interval:s:1 { print(@syscalls); clear(@syscalls); }'
Attaching 2 probes...
@syscalls: 15518
@syscalls: 15510
@syscalls: 17648
@syscalls: 14906
@syscalls: 15070
@syscalls: 15713
Not that there is anything wrong with running clock_gettime
, but when I run top/htop at the same time as below
I notice it takes anywhere in the range of ~7-12% CPU. From digging around the code I'm guessing the store crate is probably where many of these calls are coming from:
rg -i 'now\(' | cut -d ':' -f 1 | sort | uniq -c
1 below/common/src/cliutil.rs
32 below/common/src/dateutil.rs
2 below/model/src/collector.rs
7 below/src/main.rs
4 below/src/test.rs
1 below/store/src/advance.rs
9 below/store/src/cursor.rs
11 below/store/src/lib.rs
1 below/view/src/lib.rs
Hmm yeah this isn't ideal. It seems we use this generously across some variant of ::now
across which presumably uses clock_gettime
. We should try to see if we can improve this.
Live view, here's some more data.
cpu/kernel:
$ lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 48 bits physical, 48 bits virtual
Byte Order: Little Endian
CPU(s): 16
On-line CPU(s) list: 0-15
Vendor ID: AuthenticAMD
BIOS Vendor ID: Advanced Micro Devices, Inc.
Model name: AMD Ryzen 7 PRO 4750U with Radeon Graphics
BIOS Model name: AMD Ryzen 7 PRO 4750U with Radeon Graphics
CPU family: 23
Model: 96
Thread(s) per core: 2
Core(s) per socket: 8
Socket(s): 1
Stepping: 1
Frequency boost: enabled
CPU max MHz: 1700.0000
CPU min MHz: 1400.0000
BogoMIPS: 3393.52
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm con
stant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f
16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpex
t perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 cqm rdt_a rdseed adx smap clflushopt c
lwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr rdpru wbnoinvd arat npt lbrv svm
_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif v_spec_ctrl umip rdpid overflow_reco
v succor smca
Virtualization features:
Virtualization: AMD-V
Caches (sum of all):
L1d: 256 KiB (8 instances)
L1i: 256 KiB (8 instances)
L2: 4 MiB (8 instances)
L3: 8 MiB (2 instances)
NUMA:
NUMA node(s): 1
NUMA node0 CPU(s): 0-15
Vulnerabilities:
Itlb multihit: Not affected
L1tf: Not affected
Mds: Not affected
Meltdown: Not affected
Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl
Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Spectre v2: Mitigation; Full AMD retpoline, IBPB conditional, IBRS_FW, STIBP conditional, RSB filling
Srbds: Not affected
Tsx async abort: Not affected
$ uname -r
Linux l1441l 5.16.0-rc2-x86_64 #5 SMP PREEMPT Sun Dec 12 21:32:14 EST 2021 x86_64 AMD Ryzen 7 PRO 4750U with Radeon Graphics AuthenticAMD GNU/Linux
dmesg
of clock source related events:
dmesg -HkuT | grep -i clock
[Wed Dec 22 08:37:20 2021] clocksource: refined-jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 1910969940391419 ns
[Wed Dec 22 08:37:20 2021] clocksource: hpet: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 133484873504 ns
[Wed Dec 22 08:37:20 2021] clocksource: tsc-early: mask: 0xffffffffffffffff max_cycles: 0x187534688da, max_idle_ns: 440795223786 ns
[Wed Dec 22 08:37:20 2021] ..... CPU clock speed is 1696.0769 MHz.
[Wed Dec 22 08:37:20 2021] ..... host bus clock speed is 99.0809 MHz.
[Wed Dec 22 08:37:20 2021] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 1911260446275000 ns
[Wed Dec 22 08:37:20 2021] acpi PNP0A08:00: _OSC: OS supports [ExtendedConfig ASPM ClockPM Segments MSI HPX-Type3]
[Wed Dec 22 08:37:20 2021] PTP clock support registered
[Wed Dec 22 08:37:20 2021] clocksource: Switched to clocksource tsc-early
[Wed Dec 22 08:37:20 2021] clocksource: acpi_pm: mask: 0xffffff max_cycles: 0xffffff, max_idle_ns: 2085701024 ns
[Wed Dec 22 08:37:21 2021] clocksource: timekeeping watchdog on CPU1: Marking clocksource 'tsc-early' as unstable because the skew is too large:
[Wed Dec 22 08:37:21 2021] clocksource: 'hpet' wd_nsec: 498437371 wd_now: e018d6 wd_last: 73330a mask: ffffffff
[Wed Dec 22 08:37:21 2021] clocksource: 'tsc-early' cs_nsec: 778421340 cs_now: a1b316e5d cs_last: 9cc77b002 mask: ffffffffffffffff
[Wed Dec 22 08:37:21 2021] clocksource: 'tsc-early' is current clocksource.
[Wed Dec 22 08:37:21 2021] tsc: Marking TSC unstable due to clocksource watchdog
[Wed Dec 22 08:37:21 2021] clocksource: Switched to clocksource hpet
[Wed Dec 22 08:37:21 2021] Unstable clock detected, switching default tracing clock to 'global'
If you want to keep using the local clock, then add:
'trace_clock=local'
Clock source info:
$ cat /sys/devices/system/clocksource/clocksource0/{available,current}_clocksource
hpet acpi_pm
hpet
Some interesting data when switching clock sources:
$ cat /sys/devices/system/clocksource/clocksource0/current_clocksource
hpet
$ bpftrace -e 'tracepoint:syscalls:sys_enter_clock_gettime /comm == "below"/ { @syscalls = count(); }
interval:s:1 { print(@syscalls); clear(@syscalls); }'
Attaching 2 probes...
@syscalls: 17429
@syscalls: 16526
@syscalls: 17092
@syscalls: 15773
@syscalls: 15430
$ echo acpi_pm > /sys/devices/system/clocksource/clocksource0/current_clocksource
$ cat /sys/devices/system/clocksource/clocksource0/current_clocksource
acpi_pm
$ bpftrace -e 'tracepoint:syscalls:sys_enter_clock_gettime /comm == "below"/ { @syscalls = count(); }
interval:s:1 { print(@syscalls); clear(@syscalls); }'
Attaching 2 probes...
@syscalls: 11498
@syscalls: 11862
@syscalls: 11918
@syscalls: 13113
@syscalls: 11938
@syscalls: 11978
It seems like with acpi_pm
clock source there are fewer syscalls.
I can confirm clocksource is being changed from dmesg
as well:
$ dmesg -HkuT | grep clocksource | tail -n 5
[Wed Dec 22 08:37:21 2021] clocksource: Switched to clocksource hpet
[Wed Dec 22 19:23:30 2021] clocksource: Switched to clocksource acpi_pm
[Wed Dec 22 19:24:57 2021] clocksource: Switched to clocksource hpet
[Wed Dec 22 19:27:42 2021] clocksource: Switched to clocksource acpi_pm
[Wed Dec 22 19:30:54 2021] clocksource: Switched to clocksource hpet
I ran the relevant timens.c
kernel test for sanity purposes:
/usr/src/linux/tools/testing/selftests/timens $ ./timens
1..10
ok 1 Passed for CLOCK_BOOTTIME (syscall)
ok 2 Passed for CLOCK_BOOTTIME (vdso)
ok 3 Passed for CLOCK_BOOTTIME_ALARM (syscall)
ok 4 Passed for CLOCK_BOOTTIME_ALARM (vdso)
ok 5 Passed for CLOCK_MONOTONIC (syscall)
ok 6 Passed for CLOCK_MONOTONIC (vdso)
ok 7 Passed for CLOCK_MONOTONIC_COARSE (syscall)
ok 8 Passed for CLOCK_MONOTONIC_COARSE (vdso)
ok 9 Passed for CLOCK_MONOTONIC_RAW (syscall)
ok 10 Passed for CLOCK_MONOTONIC_RAW (vdso)
# Totals: pass:10 fail:0 xfail:0 xpass:0 skip:0 error:0
Some more perf trace
data:
$ perf trace --sched -sv -p `pgrep below`
Using CPUID AuthenticAMD-23-60-1
mmap size 528384B
Summary of events:
below (26201), 65586 events, 79.1%, 746.654 msec
syscall calls errors total min avg max stddev
(msec) (msec) (msec) (msec) (%)
--------------- -------- ------ -------- --------- --------- --------- ------
clock_nanosleep 237 0 7169.494 0.000 30.251 30.664 0.42%
clock_gettime 25586 0 117.166 0.000 0.005 0.042 0.25%
epoll_wait 6201 0 21.078 0.000 0.003 0.048 0.66%
openat 58 0 1.379 0.004 0.024 0.050 10.38%
ioctl 58 0 0.225 0.001 0.004 0.008 8.30%
close 58 0 0.165 0.001 0.003 0.005 6.88%
write 2 0 0.017 0.007 0.009 0.010 14.34%
below (26243), 17354 events, 20.9%, 54.356 msec
syscall calls errors total min avg max stddev
(msec) (msec) (msec) (msec) (%)
--------------- -------- ------ -------- --------- --------- --------- ------
poll 8 0 21.783 0.022 2.723 13.213 59.53%
read 2256 6 15.014 0.001 0.007 0.117 2.56%
openat 2127 314 7.403 0.002 0.003 0.067 1.46%
futex 697 105 4.389 0.000 0.006 0.066 4.81%
close 1819 0 2.996 0.001 0.002 0.032 1.46%
clock_gettime 718 0 1.701 0.001 0.002 0.011 1.29%
statx 419 0 1.209 0.002 0.003 0.009 1.30%
readlink 361 302 1.191 0.002 0.003 0.015 1.94%
clone 7 0 0.834 0.031 0.119 0.327 40.91%
getdents64 20 0 0.643 0.001 0.032 0.420 64.12%
rt_sigprocmask 14 0 0.090 0.001 0.006 0.031 39.21%
newfstatat 18 0 0.076 0.002 0.004 0.018 21.80%
pipe2 4 0 0.046 0.003 0.011 0.025 42.03%
wait4 2 0 0.043 0.021 0.021 0.022 1.10%
fcntl 18 0 0.041 0.001 0.002 0.004 9.24%
readlinkat 9 0 0.029 0.003 0.003 0.004 6.20%
prlimit64 12 0 0.023 0.001 0.002 0.003 12.25%
mmap 2 0 0.020 0.009 0.010 0.012 12.98%
munmap 2 0 0.020 0.006 0.010 0.014 41.34%
ioctl 6 0 0.018 0.001 0.003 0.005 18.35%
fstatfs 1 0 0.007 0.007 0.007 0.007 0.00%
uname 1 0 0.002 0.002 0.002 0.002 0.00%
After doing some more digging it seems the chrono
crate doesn't use VDSO (issue), where as the standard lib should just call into glibc.
Yeah, not using VDSO is pretty silly. We either need to make the chrono crate do it or use the standard library implementation
@hodgesds Are you installing from rpm or building from source? If rpm, is it fedora? If so, which version of fedora?
According to chronotope/chrono@5115c6f#diff-3e4364fa6f0aed7a40e68aa12a41235e57f45378bcd8f0094496a7c1d8be0ed0 - chrono should be hitting the standard library here so I would guess the issue is installing using an older version of chrono crate.
Building from source from head (commit 3b6554f4c4a95d5f7c74cdd7ee6c05bfed47ffe1).
I do see vdso being (dynamically) linked:
$ ldd target/release/below
linux-vdso.so.1 (0x00007fffc6f71000)
libelf.so.1 => /usr/lib64/libelf.so.1 (0x00007ffa3591c000)
libz.so.1 => /lib64/libz.so.1 (0x00007ffa35902000)
libgcc_s.so.1 => /usr/lib/gcc/x86_64-pc-linux-gnu/11.2.0/libgcc_s.so.1 (0x00007ffa358e7000)
librt.so.1 => /lib64/librt.so.1 (0x00007ffa358dd000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x00007ffa358bd000)
libm.so.6 => /lib64/libm.so.6 (0x00007ffa35788000)
libdl.so.2 => /lib64/libdl.so.2 (0x00007ffa35780000)
libc.so.6 => /lib64/libc.so.6 (0x00007ffa355c0000)
/lib64/ld-linux-x86-64.so.2 (0x00007ffa36156000)
$ objdump -rR target/release/below | grep -i time
000000000087aea8 R_X86_64_GLOB_DAT localtime_r@GLIBC_2.2.5
000000000087b6e8 R_X86_64_GLOB_DAT clock_gettime@GLIBC_2.17
000000000087b8c0 R_X86_64_GLOB_DAT mktime@GLIBC_2.2.5
000000000087df50 R_X86_64_GLOB_DAT timegm@GLIBC_2.2.5
Ok, going to close this since I think building from source acts as you expect (VDSO is cheap) - this is probably an issue with rpms from some older version of Fedora with an older chrono crate