perf statistical profiling (perf record/report) broken on upstream v5.9 kernel (HSDK-4xD)
vineetgarc opened this issue · 2 comments
This came up in recent benchmarking activity. In mainline kernels statistical profiling or perf record is broken
/home/root>perf record ./hackbench
Couldn't synthesize bpf events.
Couldn't synthesize cgroup events.
Running with 10 groups 400 process
Time: 0.977
[ perf record: Woken up 5 times to write data ]
[ perf record: Captured and wrote 0.038 MB perf.data ]
/home/root>perf report -n --stdio
Error:
The perf.data data has no samples!
# To display the perf.data header info, please use --header/--header-only options.
#
At first glance it seemed like my recent commits to enable perf on HSDK-4xD merged upstream were missing, specifically
2020-07-09 fe81d927b78c ARC: HSDK: wireup perf irq
2020-07-26 feb92d7d3813 ARC: perf: don't bail setup if pct irq missing in device-tree
However this is v5.9 which already has my changes. Initial investigation reveals pct IRQ not getting enabled hence the issue with statistical profiling
/home/root>uname -a
Linux hsdk 5.9.0-00002-g0d2dadadb615-dirty #90 SMP PREEMPT Wed Oct 21 16:38:10 PDT 2020 arc arc arc GNU/Linux
/home/root>cat /proc/interrupts
CPU0 CPU1 CPU2 CPU3
6: 325 325 325 325 MCIP IDU Intc 6 ttyS0
10: 3 3 2 2 MCIP IDU Intc 10 eth0
12: 1202 1202 1201 1201 MCIP IDU Intc 12 dw-mci
15: 6 6 5 5 MCIP IDU Intc 15 ehci_hcd:usb1, ohci_hcd:usb2
16: 20850 37391 3050 11074 ARCv2 core Intc 16 Timer0 (per-cpu-tick)
17: 33 33 33 32 MCIP IDU Intc 16 f0020000.spi
19: 28147 10479 15783 12815 ARCv2 core Intc 19 IPI Interrupt
21: 0 0 0 0 ARCv2 core Intc 21 IPI Interrupt
88: 0 0 0 0 MCIP IDU Intc 28 f0090000.gpu
89: 0 0 0 0 MCIP IDU Intc 27 dw_axi_dmac_platform
Added a diagnostic patch (slated for mainline after merge window) 66f825cc25fbfc8, "(ARC: perf: check if pct irq request fails )" which at least shows the root cause.
/home/root>perf record -c 1000
Error:
cycles: PMU Hardware doesn't support sampling/overflow-interrupts. Try 'perf stat'
But I don't know why this is and thus needs to be debugged.
This turned out to be much more mundane and silly than expected.
First of all it was a regression caused by the prior fix for exact same issue, merged in v4.9-rc4, commit feb92d7d3813456c11dce21 "(ARC: perf: don't bail setup if pct irq missing in device-tree)"
Second it was the most basic "C" coding bug almost a newbie to "C" would code. The assignment and comparison in an if statement were not bracketed correctly, leaving the order of evaluation undefined.
if (has_interrupts && (irq = platform_get_irq(pdev, 0) >= 0)) {
^^^ ^^^^
And given such a chance, the compiler will bite you hard, fully entitled to generating this piece of beauty:
bl @platform_get_irq <-- irq returned in r0
setge r2, r0, 0 <-- r2 is bool 1 or 0 depending on irq >= 0 true/false
brlt.d r0, 0, @.L114
st_s r2,[sp] <-- irq saved is bool 1 or 0, not actual return val [NOK]
st 1,[r3,160] # arc_pmu.18_29->irq <-- drops bool and assumes 1 [NOK]
bl.d @__request_percpu_irq;
mov_s r0,1 <-- drops even bool and assumes 1 which fails [NOK]
With the snafu fixed, everything is as expected.
bl @platform_get_irq <-- returns irq in r0
mov_s r2,r0
brlt.d r2, 0, @.L112
st_s r0,[sp] <-- irq isaved is actual return value above
st r0,[r13,160] #arc_pmu.18_27->irq
bl.d @__request_percpu_irq <-- r0 unchanged so actual irq returned
add r4,r4,r12 #, tmp363, __ptr
Fix effectively is
- if (has_interrupts && (irq = platform_get_irq(pdev, 0) >= 0)) {
+ if (has_interrupts && ((irq = platform_get_irq(pdev, 0)) >= 0)) {
But the real commit reworks the code a bit more to simplify setting PERF_PMU_CAP_NO_INTERRUPT
Pull request with fix for 5.10-rc1 sent to Linus
Fix merged upstream: 8c42a5c02bec6c ARC: perf: redo the pct irq missing in device-tree handling