51d334a845a082338735b0fdfc620a4b15fa26fe breaks x86_64 ThinLTO + dynamic ftrace
samitolvanen opened this issue · 17 comments
Starting with LLVM commit 51d334a845a082338735b0fdfc620a4b15fa26fe ("[NFCI] Lazily evaluate SCEVs of PHIs"), ToT x86_64 defconfig + ThinLTO + dynamic ftrace fails to boot:
[ 0.639299] =====================================
[ 0.639718] WARNING: bad unlock balance detected!
[ 0.640137] 5.13.0-rc3+ #1 Not tainted
[ 0.640143] -------------------------------------
[ 0.640143] swapper/0/1 is trying to release lock (
[ 0.640143] general protection fault, probably for non-canonical address 0x25453be8082474ff: 0000 [#1] SMP PTI
[ 0.640143] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.13.0-rc3+ #1
[ 0.640143] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014
[ 0.640143] RIP: 0010:string+0x50/0x110
[ 0.640143] Code: ce 48 c1 fe 30 85 f6 0f 84 be 00 00 00 31 c0 eb 15 66 0f 1f 84 00 00 00 00 00 48 83 c0 01 39 c6 0f 84 ac 00 00 00 4c 81
[ 0.640143] RSP: 0000:ffff927ec0013af0 EFLAGS: 00010046
[ 0.640143] RAX: 0000000000000000 RBX: ffff927ec0013bea RCX: ffff0a00ffffff04
[ 0.640143] RDX: ffff927ec0013bf0 RSI: ffffffffffffffff RDI: ffff927ec0013bea
[ 0.640143] RBP: ffffffff8a2f044d R08: 25453be8082474ff R09: ffff927ec0013bea
[ 0.640143] R10: 0000001000000000 R11: 0000000400000000 R12: ffff927ec0013bf0
[ 0.640143] R13: ffffffff8a2f044f R14: ffff0a00ffffff04 R15: 0000000000000002
[ 0.640143] FS: 0000000000000000(0000) GS:ffff90ff5f400000(0000) knlGS:0000000000000000
[ 0.640143] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 0.640143] CR2: ffff90ff56001000 CR3: 0000000014a22000 CR4: 00000000000006f0
[ 0.640143] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 0.640143] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 0.640143] Call Trace:
[ 0.640143] vsnprintf+0x615/0x750
[ 0.640143] vprintk_store+0xb8/0x4f0
[ 0.640143] ? vprintk_emit+0x103/0x180
[ 0.640143] ? vprintk_emit+0x103/0x180
[ 0.640143] vprintk_emit+0x89/0x180
[ 0.640143] ? _raw_read_lock+0x14/0x50
[ 0.640143] vprintk+0x12e/0x150
[ 0.640143] ? _raw_read_lock+0x14/0x50
[ 0.640143] printk+0x59/0x78
[ 0.640143] print_unlock_imbalance_bug+0x122/0x190
[ 0.640143] ? lockdep_hardirqs_on_prepare+0x187/0x280
[ 0.640143] ? _raw_read_lock+0x14/0x50
[ 0.640143] ? _raw_read_lock+0x14/0x50
[ 0.640143] lock_release+0x18f/0x2a0
[ 0.640143] ? _raw_spin_unlock_irq+0x24/0x30
[ 0.640143] percpu_up_write+0x1a/0x50
[ 0.640143] ? _cpu_up+0x2e/0x260
[ 0.640143] ? cpu_up.llvm.6178664248410513755+0x6a/0xa0
[ 0.640143] ? kernel_init+0xa/0x1b0
[ 0.640143] ? bringup_nonboot_cpus+0x6a/0x70
[ 0.640143] ? smp_init+0x28/0x7f
[ 0.640143] ? kernel_init_freeable+0x147/0x1b5
[ 0.640143] ? rest_init+0x1e0/0x1e0
[ 0.640143] ? kernel_init+0xa/0x1b0
[ 0.640143] ? ret_from_fork+0x22/0x30
[ 0.640143] Modules linked in:
[ 0.640143] Dumping ftrace buffer:
[ 0.640143] (ftrace buffer empty)
[ 0.640143] ---[ end trace a4719c0acf67613d ]---
[ 0.640143] RIP: 0010:string+0x50/0x110
[ 0.640143] Code: ce 48 c1 fe 30 85 f6 0f 84 be 00 00 00 31 c0 eb 15 66 0f 1f 84 00 00 00 00 00 48 83 c0 01 39 c6 0f 84 ac 00 00 00 4c 81
[ 0.640143] RSP: 0000:ffff927ec0013af0 EFLAGS: 00010046
[ 0.640143] RAX: 0000000000000000 RBX: ffff927ec0013bea RCX: ffff0a00ffffff04
[ 0.640143] RDX: ffff927ec0013bf0 RSI: ffffffffffffffff RDI: ffff927ec0013bea
[ 0.640143] RBP: ffffffff8a2f044d R08: 25453be8082474ff R09: ffff927ec0013bea
[ 0.640143] R10: 0000001000000000 R11: 0000000400000000 R12: ffff927ec0013bf0
[ 0.640143] R13: ffffffff8a2f044f R14: ffff0a00ffffff04 R15: 0000000000000002
[ 0.640143] FS: 0000000000000000(0000) GS:ffff90ff5f400000(0000) knlGS:0000000000000000
[ 0.640143] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 0.640143] CR2: ffff90ff56001000 CR3: 0000000014a22000 CR4: 00000000000006f0
[ 0.640143] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 0.640143] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 0.640143] Kernel panic - not syncing: Fatal exception
[ 0.640143] Dumping ftrace buffer:
[ 0.640143] (ftrace buffer empty)
The kernel was configured as follows:
$ make ARCH=x86_64 LLVM=1 LLVM_IAS=1 defconfig
# ThinLTO + dynamic ftrace + CBL CI configs
$ ./scripts/config -e LTO_CLANG_THIN -d LTO_CLANG_FULL -d GCOV_KERNEL -d KASAN -e FTRACE_STARTUP_TEST -e FUNCTION_TRACER -e DYNAMIC_FTRACE -e FTRACE -e TRACING -e LOCK_TORTURE_TEST -e RCU_TORTURE_TEST -e EXPERT -e DEBUG_KERNEL -e LOCK_DEBUGGING_SUPPORT -e DEBUG_RT_MUTEXES -e DEBUG_SPINLOCK -e DEBUG_MUTEXES -e DEBUG_WW_MUTEX_SLOWPATH -e DEBUG_LOCK_ALLOC -e DEBUG_LOCKDEP -e DEBUG_ATOMIC_SLEEP -e CRYPTO_MANAGER_DISABLE_TESTS -e CRC32_SELFTEST -e DEBUG_LOCKING_API_SELFTESTS -e DEBUG_NMI_SELFTEST -e DEBUG_RODATA_TEST -e DEBUG_TLBFLUSH -e DMATEST -e GLOB_SELFTEST -e PCI_EPF_TEST -e PCI_ENDPOINT_TEST -e PROVE_LOCKING -e PROVE_RCU -e RBTREE_TEST -e RCU_EQS_DEBUG -e STATIC_KEYS_SELFTEST -e STRING_SELFTEST -e TEST_BITMAP -e TEST_FIRMWARE -e TEST_SORT -e TEST_SYSCTL -e TEST_UUID -e TORTURE_TEST -e USB_TEST -e USB_EHSET_TEST_FIXTURE -e USB_LINK_LAYER_TEST -e WW_MUTEX_SELFTEST -e BLK_DEV_INITRD -e DEVTMPFS -e DEVTMPFS_MOUNT -e SCSI -e BLK_DEV_SD -e SCSI_LOWLEVEL -e SCSI_DC395x -e SCSI_AM53C974 -e SCSI_SYM53C8XX_2 -e MEGARAID_SAS -e FUSION -e FUSION_SAS -e MMC -e MMC_SDHCI -e MMC_SDHCI_PCI -e BLK_DEV_NVME -e USB -e USB_XHCI_HCD -e USB_EHCI_HCD -e USB_OHCI_HCD -e USB_STORAGE -e USB_UAS -e VIRTIO -e VIRTIO_PCI -e VIRTIO_PCI_LEGACY -e VIRTIO_BALLOON -e VIRTIO_MMIO -e BLK_MQ_VIRTIO -e VIRTIO_BLK -e VIRTIO_BLK_SCSI -e SCSI_VIRTIO
$ make ARCH=x86_64 LLVM=1 LLVM_IAS=1 olddefconfig
During the build I observed the following objtool warnings, which were not present with earlier compiler versions:
LTO vmlinux.o
OBJTOOL vmlinux.o
vmlinux.o: warning: objtool: .text.percpu_down_write: unexpected end of section
vmlinux.o: warning: objtool: .text.try_check_zero: unexpected end of section
Tested with kernel commit 7ac3a1c.
Does your build of LLVM have assertions enabled? It would be interesting to see if there were any new triggers. The diff of those sections before and after that commit might be interesting as well. Does not look like that commit got any precommit review so it should probably be reverted while this is investigated.
Looks like that patch is a precursor to this patch: https://reviews.llvm.org/D102615#2784051
Does your build of LLVM have assertions enabled? It would be interesting to see if there were any new triggers.
After rebuilding LLVM with -DCMAKE_BUILD_TYPE=Debug
and -DLLVM_ENABLE_ASSERTIONS=1
, I don't see any new warnings or errors when building the kernel.
The diff of those sections before and after that commit might be interesting as well.
percpu_down_write.59d938e649e62db0cef4903d495e838fbc6a6eb8.txt
percpu_down_write.51d334a845a082338735b0fdfc620a4b15fa26fe.txt
try_check_zero.59d938e649e62db0cef4903d495e838fbc6a6eb8.txt
try_check_zero.51d334a845a082338735b0fdfc620a4b15fa26fe.txt
No idea how it happens but we seem to have lost the last call to _raw_spin_lock_irqsave
, which I assume is why everything explodes...
Before:
100: 5b pop %rbx
101: 41 5e pop %r14
103: 41 5f pop %r15
105: 5d pop %rbp
106: e9 00 00 00 00 jmpq 10b <percpu_down_write+0x10b>
107: R_X86_64_PLT32 .text.finish_rcuwait+0x5c
10b: e8 00 00 00 00 callq 110 <_raw_spin_lock_irqsave>
10c: R_X86_64_PLT32 schedule-0x4
After:
117: 5b pop %rbx
118: 41 5c pop %r12
11a: 41 5d pop %r13
11c: 41 5e pop %r14
11e: 41 5f pop %r15
120: 5d pop %rbp
121: e9 00 00 00 00 jmpq 126 <i915_pipestat_enable_mask.__warned.9>
122: R_X86_64_PLT32 .text.finish_rcuwait+0x5c
Not sure about the try_check_zero
section warning though.
It's actually the broken commit 51d334a845a082338735b0fdfc620a4b15fa26fe
that has the _raw_spin_lock_irqsave
call at the end of that function:
kernel-build.51d334a845a082338735b0fdfc620a4b15fa26fe/vmlinux.o: file format elf64-x86-64
Disassembly of section .text.percpu_down_write:
0000000000000000 <percpu_down_write>:
...
106: e9 00 00 00 00 jmpq 10b <percpu_down_write+0x10b>
107: R_X86_64_PLT32 .text.finish_rcuwait+0x5c
10b: e8 00 00 00 00 callq 110 <_raw_spin_lock_irqsave>
10c: R_X86_64_PLT32 schedule-0x4
59d938e649e62db0cef4903d495e838fbc6a6eb8
is the previous commit, which still produces a working kernel.
Also note that the disassembly above is after objtool. Let me rebuild and see if there's something obvious that trips it up.
Starting with LLVM commit 51d334a845a082338735b0fdfc620a4b15fa26fe ("[NFCI] Lazily evaluate SCEVs of PHIs"), ToT x86_64 defconfig + ThinLTO + dynamic ftrace fails to boot
I guess it wasn't "No Functional Change Intended" then.
Looks like objtool just correctly removed the __fentry__
call in both cases. Otherwise the functions are identical before and after. For example:
0000000000000000 <percpu_down_write>:
- 0: e8 00 00 00 00 callq 5 <percpu_down_write+0x5>
- 1: R_X86_64_PLT32 __fentry__-0x4
+ 0: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
+ 1: R_X86_64_NONE __fentry__-0x4
5: 55 push %rbp
6: 41 57 push %r15
8: 41 56 push %r14
No functional change was intended, must be a bug. Patch reverted. Sorry for disturbance!
Please confirm the failure is gone with
commit 6a2af607ad3523ddc3778b0efb7bb1d5d42a1edb (HEAD -> main, origin/main)
Author: Max Kazantsev <mkazantsev@azul.com>
Date: Fri May 28 11:05:30 2021 +0700
Revert "[NFCI] Lazily evaluate SCEVs of PHIs"
This reverts commit 51d334a845a082338735b0fdfc620a4b15fa26fe.
Reported failures, need to analyze.
Thanks, Max. I can confirm the kernel boots again after the revert.
Can we try to get a reduced test case for @max-azul ?
Looks like https://reviews.llvm.org/D102615 has a new revision. I will try to test that later today to make sure none of the LTO/CFI builds are broken with it.
https://reviews.llvm.org/D102615 passes a regular defconfig, defconfig + CONFIG_LTO_CLANG_THIN=y, and defconfig + CONFIG_LTO_CLANG_THIN=y + CONFIG_CFI_CLANG=y for arm64 and x86_64 at llvm/llvm-project@cfa9ae9 and samitolvanen@29d7af2.
Guessing that's Diff 6 | 348875
.
Nathan, did you test that also with dynamic ftrace? Normal defconfig + LTO was fine with the earlier patch too.
@samitolvanen sorry, I missed testing with CONFIG_FUNCTION_TRACER=y
, which should test CONFIG_DYNAMIC_FTRACE
. I will do that right now.
Looks good to me with CONFIG_FUNCTION_TRACER=y
as well.