arbitrary_self_types_pointers_and_wrappers fails on aarch64 due to stack corruption with libgcc atomics
liamnaddell opened this issue · 10 comments
On aarch64, during the initialization of std, a compare_exchange_weak is called as part of std::thread::ThreadId::new, specifically line sysroot_src/library/std/src/thread/mod.rs:1190.
If I'm not mistaken, this calls out to a libgcc-implemented intrinsic, __aarch64_cas8_relax.
These intrinsics are documented here: https://github.com/llvm/llvm-project/blob/main/llvm/docs/Atomics.rst#libcalls-atomic
What appears to be happening, is that this intrinsic modifies $sp (presumably to return some argument to the caller?), however, it appears the generated rust does not expect $sp to be changed, resulting in the stored return address being set to a bogus value. When the frame is popped, we branch to some random value on the stack. This appears in GDB by observing that we branch to the "function" std::thread::ThreadId::new::COUNTER, who's "opcodes" are 0, which decodes to the udf, instruction on arm, causing a segfault (or bus error if we branch to 0x1, which appears in some other examples).
I've attached a reproducer that only depends on core
to provide the intrinsic. The reproducer shows the following GDB log:
liam@gentoo ~/rustc_codegen_gcc $ gdb target/out/reproduce_core
GNU gdb (Gentoo 14.2 vanilla) 14.2
Copyright (C) 2023 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "aarch64-unknown-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://bugs.gentoo.org/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from target/out/reproduce_core...
(gdb) b __aarch64_cas4_relax
Breakpoint 1 at 0xcf8
(gdb) run
Starting program: /home/liam/rustc_codegen_gcc/target/out/reproduce_core
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib64/libthread_db.so.1".
Breakpoint 1, 0x0000aaaaaaaa0cf8 in __aarch64_cas4_relax (param0=0, param1=1, param2=0xffffffffee64)
(gdb) disassemble
Dump of assembler code for function __aarch64_cas4_relax:
0x0000aaaaaaaa0ce4 <+0>: sub sp, sp, #0x10
0x0000aaaaaaaa0ce8 <+4>: str w0, [sp, #12]
0x0000aaaaaaaa0cec <+8>: str w1, [sp, #8]
0x0000aaaaaaaa0cf0 <+12>: str x2, [sp]
0x0000aaaaaaaa0cf4 <+16>: mov w16, w0
=> 0x0000aaaaaaaa0cf8 <+20>: ldxr w0, [x2]
0x0000aaaaaaaa0cfc <+24>: cmp w0, w16
0x0000aaaaaaaa0d00 <+28>: b.ne 0xaaaaaaaa0d0c <__aarch64_cas4_relax+40> // b.any
0x0000aaaaaaaa0d04 <+32>: stxr w17, w1, [x2]
0x0000aaaaaaaa0d08 <+36>: cbnz w17, 0xaaaaaaaa0cf8 <__aarch64_cas4_relax+20>
0x0000aaaaaaaa0d0c <+40>: ret
End of assembler dump.
(gdb) bt
#0 0x0000aaaaaaaa0cf8 in __aarch64_cas4_relax (param0=0, param1=1, param2=0xffffffffee64)
#1 0x0000aaaaaaaa07cc in reproduce_core::perform_bad ()
#2 0x0000aaaaaaaa0888 in main ()
(gdb) si
0x0000aaaaaaaa0d08 in __aarch64_cas4_relax (param0=0, param1=1, param2=0xffffffffee64)
(gdb) bt
#0 0x0000aaaaaaaa0d08 in __aarch64_cas4_relax (param0=0, param1=1, param2=0xffffffffee64)
#1 0x0000aaaaaaaa07cc in reproduce_core::perform_bad ()
#2 0x0000aaaaaaaa0888 in main ()
(gdb) set disassemble-next-line on
(gdb) si
0x0000aaaaaaaa0d0c in __aarch64_cas4_relax (param0=0, param1=1, param2=0xffffffffee64)
=> 0x0000aaaaaaaa0d0c <__aarch64_cas4_relax+40>: d65f03c0 ret
(gdb)
0x0000aaaaaaaa07cc in reproduce_core::perform_bad ()
=> 0x0000aaaaaaaa07cc <_ZN14reproduce_core11perform_bad17h8646925cf068046aE+56>: 6b14001f cmp w0, w20
(gdb) x/6xg $sp
0xffffffffee30: 0x0000ffffffffee64 0x0000000000000001
0xffffffffee40: 0x0000ffffffffee80 0x0000aaaaaaaa0888
0xffffffffee50: 0x0000fffffffff038 0x0000000000000001
(gdb) si
0x0000aaaaaaaa07d0 in reproduce_core::perform_bad ()
=> 0x0000aaaaaaaa07d0 <_ZN14reproduce_core11perform_bad17h8646925cf068046aE+60>: 2a0003e1 mov w1, w0
(gdb)
0x0000aaaaaaaa07d4 in reproduce_core::perform_bad ()
=> 0x0000aaaaaaaa07d4 <_ZN14reproduce_core11perform_bad17h8646925cf068046aE+64>: 1a9f17e0 cset w0, eq // eq = none
(gdb)
0x0000aaaaaaaa07d8 in reproduce_core::perform_bad ()
=> 0x0000aaaaaaaa07d8 <_ZN14reproduce_core11perform_bad17h8646925cf068046aE+68>: 7100001f cmp w0, #0x0
(gdb)
0x0000aaaaaaaa07dc in reproduce_core::perform_bad ()
=> 0x0000aaaaaaaa07dc <_ZN14reproduce_core11perform_bad17h8646925cf068046aE+72>: 54000041 b.ne 0xaaaaaaaa07e4 <_ZN14reproduce_core11perform_bad17h8646925cf068046aE+80> // b.any
(gdb)
0x0000aaaaaaaa07e4 in reproduce_core::perform_bad ()
=> 0x0000aaaaaaaa07e4 <_ZN14reproduce_core11perform_bad17h8646925cf068046aE+80>: 3900ffe0 strb w0, [sp, #63]
(gdb)
0x0000aaaaaaaa07e8 in reproduce_core::perform_bad ()
=> 0x0000aaaaaaaa07e8 <_ZN14reproduce_core11perform_bad17h8646925cf068046aE+84>: 9100a3e0 add x0, sp, #0x28
(gdb)
0x0000aaaaaaaa07ec in reproduce_core::perform_bad ()
=> 0x0000aaaaaaaa07ec <_ZN14reproduce_core11perform_bad17h8646925cf068046aE+88>: b94033e1 ldr w1, [sp, #48]
(gdb)
0x0000aaaaaaaa07f0 in reproduce_core::perform_bad ()
=> 0x0000aaaaaaaa07f0 <_ZN14reproduce_core11perform_bad17h8646925cf068046aE+92>: b9000001 str w1, [x0]
(gdb)
0x0000aaaaaaaa07f4 in reproduce_core::perform_bad ()
=> 0x0000aaaaaaaa07f4 <_ZN14reproduce_core11perform_bad17h8646925cf068046aE+96>: 9100a3e0 add x0, sp, #0x28
(gdb)
0x0000aaaaaaaa07f8 in reproduce_core::perform_bad ()
=> 0x0000aaaaaaaa07f8 <_ZN14reproduce_core11perform_bad17h8646925cf068046aE+100>: 91001000 add x0, x0, #0x4
(gdb)
0x0000aaaaaaaa07fc in reproduce_core::perform_bad ()
=> 0x0000aaaaaaaa07fc <_ZN14reproduce_core11perform_bad17h8646925cf068046aE+104>: 3940ffe1 ldrb w1, [sp, #63]
(gdb)
0x0000aaaaaaaa0800 in reproduce_core::perform_bad ()
=> 0x0000aaaaaaaa0800 <_ZN14reproduce_core11perform_bad17h8646925cf068046aE+108>: 39000001 strb w1, [x0]
(gdb)
0x0000aaaaaaaa0804 in reproduce_core::perform_bad ()
=> 0x0000aaaaaaaa0804 <_ZN14reproduce_core11perform_bad17h8646925cf068046aE+112>: 9100a3e0 add x0, sp, #0x28
(gdb)
0x0000aaaaaaaa0808 in reproduce_core::perform_bad ()
=> 0x0000aaaaaaaa0808 <_ZN14reproduce_core11perform_bad17h8646925cf068046aE+116>: b9400000 ldr w0, [x0]
(gdb)
0x0000aaaaaaaa080c in reproduce_core::perform_bad ()
=> 0x0000aaaaaaaa080c <_ZN14reproduce_core11perform_bad17h8646925cf068046aE+120>: b9003be0 str w0, [sp, #56]
(gdb)
0x0000aaaaaaaa0810 in reproduce_core::perform_bad ()
=> 0x0000aaaaaaaa0810 <_ZN14reproduce_core11perform_bad17h8646925cf068046aE+124>: 9100a3e0 add x0, sp, #0x28
(gdb)
0x0000aaaaaaaa0814 in reproduce_core::perform_bad ()
=> 0x0000aaaaaaaa0814 <_ZN14reproduce_core11perform_bad17h8646925cf068046aE+128>: 91001000 add x0, x0, #0x4
(gdb)
0x0000aaaaaaaa0818 in reproduce_core::perform_bad ()
=> 0x0000aaaaaaaa0818 <_ZN14reproduce_core11perform_bad17h8646925cf068046aE+132>: 39400000 ldrb w0, [x0]
(gdb)
0x0000aaaaaaaa081c in reproduce_core::perform_bad ()
=> 0x0000aaaaaaaa081c <_ZN14reproduce_core11perform_bad17h8646925cf068046aE+136>: 3900d3e0 strb w0, [sp, #52]
(gdb)
0x0000aaaaaaaa0820 in reproduce_core::perform_bad ()
=> 0x0000aaaaaaaa0820 <_ZN14reproduce_core11perform_bad17h8646925cf068046aE+140>: d2800060 mov x0, #0x3 // #3
(gdb)
0x0000aaaaaaaa0824 in reproduce_core::perform_bad ()
=> 0x0000aaaaaaaa0824 <_ZN14reproduce_core11perform_bad17h8646925cf068046aE+144>: a94153f3 ldp x19, x20, [sp, #16]
(gdb)
0x0000aaaaaaaa0828 in reproduce_core::perform_bad ()
=> 0x0000aaaaaaaa0828 <_ZN14reproduce_core11perform_bad17h8646925cf068046aE+148>: a8c47bfd ldp x29, x30, [sp], #64
(gdb) x/6xg $sp
0xffffffffee30: 0x0000ffffffffee64 0x0000000000000001
0xffffffffee40: 0x0000ffffffffee80 0x0000aaaaaaaa0888
0xffffffffee50: 0x0000fffffffff038 0x0000000100000000
(gdb) si
0x0000aaaaaaaa082c in reproduce_core::perform_bad ()
=> 0x0000aaaaaaaa082c <_ZN14reproduce_core11perform_bad17h8646925cf068046aE+152>: d65f03c0 ret
(gdb) p/x $x30
$2 = 0x1
(gdb) # uh oh
(gdb) si
0x0000000000000001 in ?? ()
=> 0x0000000000000001:
Cannot access memory at address 0x1
(gdb)
ENV INFO:
$ cat config.toml
gcc-path = "/home/liam/rustc_gcc/gcc-install/lib"
#download-gccjit = true
liam@gentoo ~/rustc_gcc/gcc $ git remote get-url origin
https://github.com/antoyo/gcc
liam@gentoo ~/rustc_gcc/gcc $ git status
On branch master
Your branch is up to date with 'origin/master'.
Reproducer:
#![feature(
core_intrinsics,unboxed_closures, start, lang_items, never_type, linkage,
extern_types, thread_local
)]
#![no_std]
#![allow(dead_code, internal_features, non_camel_case_types)]
#![feature(intrinsics)]
#![feature(rustc_attrs)]
#![no_main]
use core::panic::PanicInfo;
extern "rust-intrinsic" {
#[rustc_nounwind]
pub fn atomic_cxchgweak_relaxed_relaxed<T: Copy>(dst: *mut T, old: T, src: T) -> (T, bool);
}
fn perform_bad() -> usize {
unsafe {
let mut var = 0;
let _result = atomic_cxchgweak_relaxed_relaxed(&mut var,0,1);
}
return 3;
}
#[panic_handler]
fn panic(_panic: &PanicInfo<'_>) -> ! {
loop {}
}
#[no_mangle]
#[start]
fn main() -> usize {
assert!(1 + 1 == 2);
perform_bad();
/*
static CTR: AtomicU64 = AtomicU64::new(0);
let mut last = CTR.load(Ordering::Relaxed);
CTR.compare_exchange_weak(0, 1, Ordering::Relaxed, Ordering::Relaxed);
*/
return 0;
}
Sorry, I forgot about this.
Did you build the sysroot with --release-sysroot
?
I followed the exact steps on the README:
$ git clone https://github.com/antoyo/gcc
$ sudo apt install flex libmpfr-dev libgmp-dev libmpc3 libmpc-dev
$ mkdir gcc-build gcc-install
$ cd gcc-build
$ ../gcc/configure \
--enable-host-shared \
--enable-languages=jit \
--enable-checking=release \ # it enables extra checks which allow to find bugs
--disable-bootstrap \
--disable-multilib \
--prefix=$(pwd)/../gcc-install
$ make -j4 # You can replace `4` with another number depending on how many cores you have.
$ cat config.toml
gcc-path="/home/liam/rustc_gcc/gcc-build/gcc"
$ ./y.sh prepare # download and patch sysroot src and install hyperfine for benchmarking
$ ./y.sh build --sysroot --release
$ ./y.sh test --release
...
[AOT] arbitrary_self_types_pointers_and_wrappers
Command failed to run: Command `target/out/arbitrary_self_types_pointers_and_wrappers` failed to run: "Process received signal 11"
When I try with
/y.sh build --release-sysroot --release --sysroot
I get the same result
Ok, I was asking because there are known issues and compiling the sysroot in release mode is a workaround for at least some of them as you can see here.
This specific one looks very similar to the one I had in the above thread (was an atomic intrinsic, was jumping at address 0).
I can't find where those intrinsics are defined right now, but if the one you have problem with is declared with #[naked]
, this could be the same issue.
This naked attribute isn't supported by GCC on Aarch64, but there's a PR in Rust that will change that so that it doesn't use the codegen naked attribute, so if this is indeed the issue we have here, that might solve the issue.
Perhaps it would be worth a try compiling your reproducer above in release mode (./y.sh run --release
) to see if this changes anything.
I won't have time to look at this soon though, but I can help you investigate after I recover from Covid-19.
Take as much time as you need, I hope you feel better soon.
As far as the reproducer, it's essentially a stripped down version of arbitrary_self_types_pointers_and_wrappers, which is the first test that depends on loading std
I also don't think these intrinsics are declared with #[naked] but I could be wrong. I haven't looked at this in a while.
https://doc.rust-lang.org/1.80.1/src/core/intrinsics.rs.html
I just tried on Asahi Linux on a Mac M1 and both the example you posted above and arbitrary_self_types_pointers_and_wrappers
works well for me.
I do have the test arbitrary_self_types_pointers_and_wrappers
failing if I compile the sysroot without --release-sysroot
, though.
Which OS do you use?
Is your CPU a M1? If not, which one is it?
Accoring to the GDB version string OP uses Gentoo Linux. macOS has a slightly different ABI from the official calling convention specified by ARM (AAPCS) that is used by Linux.
@bjorn3 , Antoyo's setup seems very similar to mine. I run arm64 linux, but my host os is MacOS, and I'm emulating gentoo linux on arm64 using QEMU. My CPU is an Apple M3 pro.
My host GCC is 13.2.1. I compiled gcc 14.0.1 from antoyo/gcc sha fd3498bff0b939dda91d56960acc33d55f2f9cdf .
I'd be very surprised if the difference between QEMU on M3 pro vs Asahi on M1 is a factor here.
@antoyo , I misunderstood your original --release-sysroot comment.
./y.sh build --sysroot --release --release-sysroot
./y.sh test --release --release-sysroot
This combination results in arbitrary_self_types_pointers_and_wrappers passing.
I'd guess this is the same issue as #242 (comment) comment points out.