ClangBuiltLinux/linux

Stack alignment plugin option boot failure with CONFIG_CFI_CLANG

nathanchance opened this issue · 11 comments

My commit 0024430e920f ("x86/build: Fix location of '-plugin-opt=' flags") prevents my WSL2 virtual machine from booting on my Intel based laptop, even though my AMD based workstation has no issues with the exact same binary.

So far, I have verified that:

  1. The issue is not present with just CONFIG_LTO_CLANG_THIN, only with CONFIG_LTO_CLANG_THIN + CONFIG_CFI_CLANG (regardless of CONFIG_CFI_PERMISSIVE).
  2. The issue is not present with CONFIG_CFI_CLANG and the following diff:
diff --git a/arch/x86/Makefile b/arch/x86/Makefile
index 307529417021..3ecc786f8144 100644
--- a/arch/x86/Makefile
+++ b/arch/x86/Makefile
@@ -200,8 +200,7 @@ endif
 KBUILD_LDFLAGS += -m elf_$(UTS_MACHINE)

 ifdef CONFIG_LTO_CLANG
-KBUILD_LDFLAGS += -plugin-opt=-code-model=kernel \
-                  -plugin-opt=-stack-alignment=$(if $(CONFIG_X86_32),4,8)
+KBUILD_LDFLAGS += -plugin-opt=-code-model=kernel
 endif

 ifdef CONFIG_X86_NEED_RELOCS

Unfortunately, getting kernel logs from WSL2 seems impossible but I will continue to research that. QEMU within WSL2 does not reproduce the issue. I will see if I can reproduce in some other way. This does not appear to LLVM version dependent because I can reproduce at LLVM 12.0.0 but I can try earlier versions to see.

cc @samitolvanen

I wonder if we're setting the stack alignment on an object file that without LTO we do not. As in without LTO, we don't set the stack alignment to 8 for a particular object file (probably by accident). Then with LTO, we do set the stack alignment and something somewhere breaks. To test that hypothesis, I'd dump the the output from make ... V=1 &> log.txt and see if any non-host object file was built without the stack alignment flag. Does full LTO make any difference, otherwise it sounds like CFI is required to reproduce, though the problematic flag seems more specific to LTO than CFI?

Good thoughts. I will try looking at the build logs as well as trying to reproduce with full LTO. If that fails, I will try to see if I can bisect down to a folder/object file by removing the CFI flags.

I wasn't able to reproduce this on qemu with defconfig + ThinLTO + CFI. Can you share your config? Hopefully this is not hardware specific.

here you go: https://gist.github.com/nathanchance/0b3b29399544963f4d0bc271d99f1cda

Thanks. This config still boots in qemu for me, so might be something specific to WSL2. I also looked at the build log and didn't see anything built with LTO that wouldn't also have -mstack-alignment=8.

Turns out CONFIG_LTO_CLANG_FULL=y also reproduces this without CONFIG_CFI_CLANG=y but I still cannot reproduce in QEMU so I am trying to use full Hyper-V; otherwise, I will have to try bisecting by disabling LTO in places or figuring out a way to get logs from WSL2.

I applied https://git.kernel.org/linus/0024430e920f2900654ad83cd081cf52e02a3ef5 on top of v5.12 and that kernel boots up no problem. 0024430e920f2 on top of v5.13-rc1 does not boot. My bisect drilled down to 9bc0bb50727c8ac69fbb33fb937431cf3518ff37:

$ git bisect log
# bad: [6efb943b8616ec53a5e444193dccf1af9ad627b5] Linux 5.13-rc1
# good: [9f4ad9e425a1d3b6a34617b8ea226d56a119a717] Linux 5.12
git bisect start 'v5.13-rc1' 'v5.12'
# bad: [71a5cc28e88b0db69c3f83d4061ad4cc684af09f] Merge tag 'mfd-next-5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd
git bisect bad 71a5cc28e88b0db69c3f83d4061ad4cc684af09f
# good: [2a19866b6e4cf554b57660549d12496ea84aa7d7] Merge tag '5.12-rc-smb3-fixes-part1' of git://git.samba.org/sfrench/cifs-2.6
git bisect good 2a19866b6e4cf554b57660549d12496ea84aa7d7
# good: [a1a1ca70deb3ec600eeabb21de7f3f48aaae5695] Merge tag 'drm-misc-next-fixes-2021-04-22' of git://anongit.freedesktop.org/drm/drm-misc into drm-next
git bisect good a1a1ca70deb3ec600eeabb21de7f3f48aaae5695
# bad: [3aa139aa9fdc138a84243dc49dc18d9b40e1c6e4] Merge tag 'media/v5.13-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media
git bisect bad 3aa139aa9fdc138a84243dc49dc18d9b40e1c6e4
# good: [fafe1e39ed213221c0bce6b0b31669334368dc97] Merge tag 'afs-netfs-lib-20210426' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs
git bisect good fafe1e39ed213221c0bce6b0b31669334368dc97
# good: [036673a7231decf66d8d73dfcf0afd375de31f6e] dt-bindings: i3c: update i3c.yaml references
git bisect good 036673a7231decf66d8d73dfcf0afd375de31f6e
# bad: [55e6be657b8d774d9a2e67363e5bcbbaf80fdc28] Merge branch 'for-5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
git bisect bad 55e6be657b8d774d9a2e67363e5bcbbaf80fdc28
# bad: [c6536676c7fe3f572ba55842e59c3c71c01e7fb3] Merge tag 'x86_core_for_v5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
git bisect bad c6536676c7fe3f572ba55842e59c3c71c01e7fb3
# good: [b1f480bc0686e65d5413c035bd13af2ea4888784] Merge branch 'x86/cpu' into WIP.x86/core, to merge the NOP changes & resolve a semantic conflict
git bisect good b1f480bc0686e65d5413c035bd13af2ea4888784
# good: [e359bce39d9085ab24eaa0bb0778bb5f6894144a] Merge tag 'audit-pr-20210426' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit
git bisect good e359bce39d9085ab24eaa0bb0778bb5f6894144a
# good: [9a7827b7789c630c1efdb121daa42c6e77dce97f] objtool: Extract elf_symbol_add()
git bisect good 9a7827b7789c630c1efdb121daa42c6e77dce97f
# bad: [9bc0bb50727c8ac69fbb33fb937431cf3518ff37] objtool/x86: Rewrite retpoline thunk calls
git bisect bad 9bc0bb50727c8ac69fbb33fb937431cf3518ff37
# good: [43d5430ad74ef5156353af7aec352426ec7a8e57] objtool: Keep track of retpoline call sites
git bisect good 43d5430ad74ef5156353af7aec352426ec7a8e57
# good: [50e7b4a1a1b264fc7df0698f2defb93cadf19a7b] objtool: Skip magical retpoline .altinstr_replacement
git bisect good 50e7b4a1a1b264fc7df0698f2defb93cadf19a7b
# first bad commit: [9bc0bb50727c8ac69fbb33fb937431cf3518ff37] objtool/x86: Rewrite retpoline thunk calls

Reverting 9bc0bb50727c8ac69fbb33fb937431cf3518ff37 and its follow up fix even on -next gets me back to a bootable state.