ClangBuiltLinux/linux

RISC-V linker relaxation breaks booting with LTO

Opened this issue · 7 comments

I forgot to boot test v3 of RISC-V LTO enablement and to my surprise, kernels built with LTO and LLVM 15.0.0 and newer were broken (do not get past OpenSBI). I did a bisect of LLVM and landed on the commit that added support for RISC-V linker relaxation to ld.lld: llvm/llvm-project@6611d58

# Apply https://lore.kernel.org/20231003-riscv-lto-v3-1-8aca61a4ecb4@kernel.org/ first then turn on CONFIG_LTO_CLANG_THIN in menuconfig
$ make -skj"$(nproc)" ARCH=riscv LLVM=1 mrproper defconfig menuconfig Image

$ boot-qemu.py -k . -t 20s
...
OpenSBI v1.3.1
   ____                    _____ ____ _____
  / __ \                  / ____|  _ \_   _|
 | |  | |_ __   ___ _ __ | (___ | |_) || |
 | |  | | '_ \ / _ \ '_ \ \___ \|  _ < | |
 | |__| | |_) |  __/ | | |____) | |_) || |_
  \____/| .__/ \___|_| |_|_____/|___/_____|
        | |
        |_|

Platform Name             : riscv-virtio,qemu
Platform Features         : medeleg
...
Boot HART ID              : 0
Boot HART Domain          : root
Boot HART Priv Version    : v1.12
Boot HART Base ISA        : rv64imafdch
Boot HART ISA Extensions  : time,sstc
Boot HART PMP Count       : 16
Boot HART PMP Granularity : 4
Boot HART PMP Address Bits: 54
Boot HART MHPM Count      : 16
Boot HART MIDELEG         : 0x0000000000001666
Boot HART MEDELEG         : 0x0000000000f0b509
qemu-system-riscv64: terminating on signal 15 from pid 2261906 (timeout)
...

With this diff applied on top:

diff --git a/arch/riscv/Makefile b/arch/riscv/Makefile
index 1329e060c548..c65e9cff59f4 100644
--- a/arch/riscv/Makefile
+++ b/arch/riscv/Makefile
@@ -45,7 +45,6 @@ else
 endif

 ifeq ($(CONFIG_LD_IS_LLD),y)
-ifeq ($(call test-lt, $(CONFIG_LLD_VERSION), 150000),y)
        KBUILD_CFLAGS += -mno-relax
        KBUILD_AFLAGS += -mno-relax
 ifndef CONFIG_AS_IS_LLVM
@@ -53,7 +52,6 @@ ifndef CONFIG_AS_IS_LLVM
        KBUILD_AFLAGS += -Wa,-mno-relax
 endif
 endif
-endif

 # ISA string setting
 riscv-march-$(CONFIG_ARCH_RV32I)       := rv32ima

The machine boots again:

$ make -skj"$(nproc)" ARCH=riscv LLVM=1 clean Image

$ boot-qemu.py -k . -t 20s
...
OpenSBI v1.3.1
   ____                    _____ ____ _____
  / __ \                  / ____|  _ \_   _|
 | |  | |_ __   ___ _ __ | (___ | |_) || |
 | |  | | '_ \ / _ \ '_ \ \___ \|  _ < | |
 | |__| | |_) |  __/ | | |____) | |_) || |_
  \____/| .__/ \___|_| |_|_____/|___/_____|
        | |
        |_|

Platform Name             : riscv-virtio,qemu
Platform Features         : medeleg
...
Boot HART ID              : 0
Boot HART Domain          : root
Boot HART Priv Version    : v1.12
Boot HART Base ISA        : rv64imafdch
Boot HART ISA Extensions  : time,sstc
Boot HART PMP Count       : 16
Boot HART PMP Granularity : 4
Boot HART PMP Address Bits: 54
Boot HART MHPM Count      : 16
Boot HART MIDELEG         : 0x0000000000001666
Boot HART MEDELEG         : 0x0000000000f0b509
[    0.000000] Linux version 6.6.0-rc4-00002-g81137efe9431-dirty (nathan@dev-arch.thelio-3990X) (ClangBuiltLinux clang version 18.0.0 (https://github.com/llvm/llvm-project 9a954c693573281407f6ee3f4eb1b16cc545033d), ClangBuiltLinux LLD 18.0.0) #4 SMP Thu Oct  5 13:39:05 MST 2023
[    0.000000] random: crng init done
[    0.000000] Machine model: riscv-virtio,qemu
[    0.000000] SBI specification v1.0 detected
[    0.000000] SBI implementation ID=0x1 Version=0x10003
[    0.000000] SBI TIME extension detected
[    0.000000] SBI IPI extension detected
[    0.000000] SBI RFENCE extension detected
[    0.000000] SBI SRST extension detected
...

I will see if I can narrow down if this happens in a particular translation unit. In the meantime, if we wanted to workaround this in the kernel, we could add something like this to v4:

diff --git a/arch/riscv/Makefile b/arch/riscv/Makefile
index 1329e060c548..ecaa784e8198 100644
--- a/arch/riscv/Makefile
+++ b/arch/riscv/Makefile
@@ -45,7 +45,7 @@ else
 endif

 ifeq ($(CONFIG_LD_IS_LLD),y)
-ifeq ($(call test-lt, $(CONFIG_LLD_VERSION), 150000),y)
+ifeq ($(filter y, $(call test-lt, $(CONFIG_LLD_VERSION), 150000) $(CONFIG_LTO_CLANG)),y)
        KBUILD_CFLAGS += -mno-relax
        KBUILD_AFLAGS += -mno-relax
 ifndef CONFIG_AS_IS_LLVM
twd2 commented

I can reproduce this issue with Linux 6.5 and LLVM commit 0ce6255a5058.

After some debugging using QEMU, I found the return value of ss->css_alloc(NULL) (where it points to cpuset_css_alloc()) in cgroup_init_subsys() was NULL and later triggered an unexpected page fault.

The source code of cpuset_css_alloc() is:

static struct cgroup_subsys_state *
cpuset_css_alloc(struct cgroup_subsys_state *parent_css)
{
	struct cpuset *cs;

	if (!parent_css)
		return &top_cpuset.css;
// ...

Disassembling cpuset_css_alloc() in vmlinux.o shows:

0000000000000000 <cpuset_css_alloc>:
   0:   1101                    addi    sp,sp,-32
   2:   ec06                    sd      ra,24(sp)
   4:   e822                    sd      s0,16(sp)
   6:   e426                    sd      s1,8(sp)
   8:   1000                    addi    s0,sp,32

000000000000000a <.Lpcrel_hi14724>:
   a:   00000597                auipc   a1,0x0
                        a: R_RISCV_PCREL_HI20   top_cpuset
   e:   00058593                mv      a1,a1
                        e: R_RISCV_PCREL_LO12_I .Lpcrel_hi14724
  12:   c935                    beqz    a0,86 <.LBB4160_4>

0000000000000014 <.Lpcrel_hi14725>:
...

0000000000000086 <.LBB4160_4>:
  86:   852e                    mv      a0,a1
  88:   60e2                    ld      ra,24(sp)
  8a:   6442                    ld      s0,16(sp)
  8c:   64a2                    ld      s1,8(sp)
  8e:   6105                    addi    sp,sp,32
  90:   8082                    ret

Note that there is no relocation entry (like llvm/llvm-project#59350) for the beqz, which makes the target of the beqz wrong after linker relaxation.

And disassembling cpuset_css_alloc() in vmlinux shows:

ffffffff800e84fc <cpuset_css_alloc>:
ffffffff800e84fc:       1101                    addi    sp,sp,-32
ffffffff800e84fe:       ec06                    sd      ra,24(sp)
ffffffff800e8500:       e822                    sd      s0,16(sp)
ffffffff800e8502:       e426                    sd      s1,8(sp)
ffffffff800e8504:       1000                    addi    s0,sp,32

ffffffff800e8506 <.Lpcrel_hi14724>:
ffffffff800e8506:       013a3597                auipc   a1,0x13a3
ffffffff800e850a:       ae258593                addi    a1,a1,-1310 # ffffffff8148afe8 <top_cpuset>
ffffffff800e850e:       c935                    beqz    a0,ffffffff800e8582 <.LBB4160_4+0x2>

ffffffff800e8510 <.Lpcrel_hi14725>:
...

ffffffff800e8580 <.LBB4160_4>:
ffffffff800e8580:       852e                    mv      a0,a1
ffffffff800e8582:       60e2                    ld      ra,24(sp)
ffffffff800e8584:       6442                    ld      s0,16(sp)
ffffffff800e8586:       64a2                    ld      s1,8(sp)
ffffffff800e8588:       6105                    addi    sp,sp,32
ffffffff800e858a:       8082                    ret

Note that the target of the beqz is wrong and mv a0,a1 is skipped, which is exactly the same issue as llvm/llvm-project#65090.

According to @MaskRay 's comment in llvm/llvm-project#59350, I tried the following diff:

diff --git a/arch/riscv/Makefile b/arch/riscv/Makefile
index 6ec6d52a4180..c8efe12b9238 100644
--- a/arch/riscv/Makefile
+++ b/arch/riscv/Makefile
@@ -55,6 +55,10 @@ endif
 endif
 endif
 
+ifdef CONFIG_LTO_CLANG
+       KBUILD_LDFLAGS += -mllvm -mattr=+c -mllvm -mattr=+relax
+endif
+

(something like llvm/llvm-project#50505 ?)

And the kernel can boot now.

Oh! Nice work!

(something like llvm/llvm-project#50505 ?)

Exactly!

  •   KBUILD_LDFLAGS += -mllvm -mattr=+c -mllvm -mattr=+relax
    

I thought llvm/llvm-project#50505 addressed -mattr=+c? Perhaps only relax is necessary?

I'm going to reopen llvm/llvm-project#50505. The above diff should be folded into the LTO series with a comment that it's working around that bug in LLVM.

@twd2 Would you like to send v4 with that diff added or would you like me to? I do not mind either way.

twd2 commented

Oh! Nice work!

(something like llvm/llvm-project#50505 ?)

Exactly!

  •   KBUILD_LDFLAGS += -mllvm -mattr=+c -mllvm -mattr=+relax
    

I thought llvm/llvm-project#50505 addressed -mattr=+c? Perhaps only relax is necessary?

I checked the bitcode and IR and found that +c and +relax were indeed encoded in the IR's target-features:

attributes #2 = { ... "target-features"="+64bit,+a,+c,+m,+relax,...

But ld.lld somehow ignored these attributes when doing LTO. (?)

I thought the diff I referenced in llvm/llvm-project#50505 (llvm/llvm-project@e63455d) only solved the issue when writing nops. And, if we don't pass +c, e_flags of the ELF header will not contain EF_RISCV_RVC (MaskRay said in llvm/llvm-project#59350).

I'm going to reopen llvm/llvm-project#50505. The above diff should be folded into the LTO series with a comment that it's working around that bug in LLVM.

So, can you please change the title of llvm/llvm-project#50505 to something like "LLD ignores target-features when doing LTO"? I think this is the root cause of 50505 and this issue.

Thanks.

twd2 commented

@twd2 Would you like to send v4 with that diff added or would you like me to? I do not mind either way.

Please do that, thanks!

I thought there needed some if-conditions to check whether we should add +c or +relax, but I have not come up with a good solution.

@hiraditya has been tracking this kind of issue with LTO and riscv for Android. He pointed me to the discussion in:

https://discourse.llvm.org/t/myterious-soft-float-output-in-lto-cache/70753