Linking with mold breaks initialization of thread local storage in Rust under FreeBSD on AMD64
mqudsi opened this issue · 6 comments
Compiling fish-shell from git master with rust 1.79.0 (latest from FreeBSD pkg repos) and mold 2.31.0 (latest from pkg repos) causes random segfaults that were traced back to misbehaving thread_local!
static variables.
Everything starts off fine, but after a few seconds of using the shell (typing at the prompt) it segfaults due to an incorrect tls thread id making some assertions fail. I edited the code some to add some debug stderr output to see what was going on and it seems that either after a certain number of times (but more likely from certain call sites) thread local storage is not being initialized upon first access by a new thread, and it is retaining its default (zero) value.
For reference, here's the (patched) function in question that assigns a custom thread id based off of an atomic variable that is incremented once per thread:
#[inline(never)]
pub fn thread_id() -> usize {
static THREAD_COUNTER: AtomicUsize = AtomicUsize::new(0);
thread_local! {
static THREAD_ID: usize = (|| {
let id = THREAD_COUNTER.fetch_add(1, Ordering::Relaxed);
eprintln!("Rust thread {:?} assigned fish thread id {}",
std::thread::current().id(), id);
id
})();
}
let id = THREAD_ID.with(|id| *id);
// ThreadId(1) is assigned (by rust's std) to the main thread upon startup
if std::thread::current().id() != unsafe { std::mem::transmute(1_u64) } {
eprintln!("Rust thread {:?} accessing fish thread_id({})", std::thread::current().id(), id);
}
id
}
Under FreeBSD, compiling with environment variable RUSTFLAGS
set to -C link-args=--ld-path=/usr/local/bin/mold
, the following incorrect behavior is observed:
- At the start of execution, new background threads emit the first
eprintln!()
call printing the mapping from the rust-native thread id to the simpleusize
fish thread id when they first accessthread_id()
and then they print the secondeprintln!()
call upon that and each subsequent invocation - Then (some seconds later, not during a race) a newly created thread (with rust thread id that is not
ThreadId(1)
) is observed callingthread_id()
but not entering thethread_local!()
initializer; it prints the secondeprintln!()
call with an incorrectid
value of0
and is never observed to enter thethread_local!()
initializer.
Compiling and running fish from master under FreeBSD:
$ pkg install git-lite cmake ninja rust mold
$ git clone https://github.com/fish-shell/fish-shell.git
$ cd fish-shell
$ env RUSTFLAGS="-C link-args=--ld-path=/usr/local/bin/mold" make
$ build/fish
This behavior is observed with the default (DEBUG) build, without any optimizations or architecture-specific optimizations enabled. Tested under FreeBSD 13.3. The same procedure without the RUSTFLAGS
environment variable set to specify building with mold
results in a binary that works correctly.
Happy to post any differential outputs of the mold-linked and normal builds as needed to help pin this down. For what it's worth, I do not observe any similar behavior using mold to link fish under Linux/AMD64.
Edit: this version of the function more directly illustrates the problem; the debug_assert_ne!()
call results in a panic because TLS was not initialized. Again, this doesn't happen right off the bat; it's after some time has passed/from a particular call site.
diff --git a/src/threads.rs b/src/threads.rs
index 06b59649f..a03bbe0af 100644
--- a/src/threads.rs
+++ b/src/threads.rs
@@ -89,3 +89,3 @@ fn main_thread_id() -> usize {
fn thread_id() -> usize {
- static THREAD_COUNTER: AtomicUsize = AtomicUsize::new(0);
+ static THREAD_COUNTER: AtomicUsize = AtomicUsize::new(42);
// It would be much nicer and faster to use #[thread_local] here, but that's nightly only.
@@ -97,3 +97,5 @@ fn thread_id() -> usize {
}
- THREAD_ID.with(|id| *id)
+ let id = THREAD_ID.with(|id| *id);
+ debug_assert_ne!(id, 0, "Thread local storage not initialized!");
+ id
}
Good news, it seems not to reproduce with mold 1.11.0 so it's probably possible to bisect this if you don't have a more direct way of arriving at the cause.
This bisected to c395da1, which was a fix for #1216
c395da1c5414656e1b22f3e71a42ba17c51673a4 is the first bad commit
commit c395da1c5414656e1b22f3e71a42ba17c51673a4 (HEAD)
Author: Rui Ueyama <ruiu@cs.stanford.edu>
Date: Wed Mar 13 16:01:02 2024 +0900
Do not create an unnecessary gap in file for BSS
Fixes https://github.com/rui314/mold/issues/1216
elf/passes.cc | 4 +---
1 file changed, 1 insertion(+), 3 deletions(-)
I don't have a FreeBSD machine, and it is not easy to set up a new one. I usually use Docker, but that works only for Linux guests. Did you know if there's a FreeBSD machine that I can login to debug this issue?
Hey, thanks for your interest in looking into this. You should be able to ssh rui314@dev.neosmart.net -p 22314
into a freshly installed copy of FreeBSD 13.3 with basic dev tools installed (neovim/nvim is also installed).
pkg search pkg-name
and sudo pkg install pkg-name
is probably the core FreeBSD thing to know.
Thank you for the test environment. The above fix should work for you. You can remove the test environment now.
Thanks for the fix; that was fast! ❤️