Tracking issue for `thread_local` stabilization

The #[thread_local] attribute is currently feature-gated. This issue tracks its stabilization.

Known problems:

#[thread_local] translates directly to the thread_local attribute in LLVM. This isn't supported on all platforms, and it's not even supported on all distributions within the same platform (e.g. macOS 10.6 didn't support it but 10.7 does). I don't think this is necessarily a blocker, but I also don't think we have many attributes and such which are so platform specific like this.
Statics that are thread local shouldn't require Sync - #18001
Statics that are thread local should either not borrow for the 'static lifetime or should be unsafe to access - #17954
Statics can currently reference other thread local statics, but this is a bug - #18712
Unsound with generators #49682
static mut can be given 'static lifetime with NLL (#54366)

@alexcrichton Can you elaborate on the blockers?

Certainly! Known issues to me:

#[thread_local] translates directly to the thread_local attribute in LLVM. This isn't supported on all platforms, and it's not even supported on all distributions within the same platform (e.g. 10.6 doesn't support it but 10.7 does). I don't think this is necessarily a blocker, but I also don't think we have many attributes and such which are so platform specific like this.
Statics that are thread local shouldn't require Sync - #18001
Statics that are thread local should either not borrow for the 'static lifetime or should be unsafe to access - #17954
Statics can currently reference other thread local statics, but this is a bug - #18712

That's at least what I can think of at this time!

A note, we've since implemented cfg(target_thread_local) which is in turn itself feature gated, but this may ease the "this isn't implemented on all platforms" worry.

Hi! Is there any update on the status? Nightly still requires statics to be Sync. I tried with:

rustc 1.13.0-nightly (acd3f796d 2016-08-28)
binary: rustc
commit-hash: acd3f796d26e9295db1eba1ef16e0d4cc3b96dd5
commit-date: 2016-08-28
host: x86_64-unknown-linux-gnu
release: 1.13.0-nightly

@alexcrichton Any news on #[thread_local] becoming stabilized? AFAIK, at the moment it is impossible on DragonFly to access errno variable from stable code, other than directly from libstd. This blocks crates like nix on DragonFly, which want to access errno as well, but libstd is not exposing it, and stable code is not allowed to use feature(thread_local).

@mneumann no, no progress. I'd recommend a C shim for now.

@alexcrichton thanks. I am doing a shim now https://github.com/mneumann/errno-dragonfly-rs.

The optimizations are too aggressive ;)

See this code:

#![feature(thread_local)]

#[thread_local]
pub static FOO: [&str; 1] = [ "Hello" ];

fn change_foo(s: &'static str) {
    FOO[0] = s;
}

fn main() {
    println!("{}", FOO[0]);
    change_foo("Test");
    println!("{}", FOO[0]);
}

The compiler does not detect the side effect in change_foo and removes the call in release. The output is:

Hello
Hello

cc @eddyb, @Boiethios your example shouldn't actually compile because it should require static mut, not just static

It compiles with the last nightly rustc.

Oh, drat, this is from my shortening of the lifetime, i.e

rust/src/librustc/middle/mem_categorization.rs

Lines 657 to 662 in dead08c

    
           // `#[thread_local]` statics may not outlive the current function. 
        
           for attr in &self.tcx.get_attrs(def_id)[..] { 
        
               if attr.check_name("thread_local") { 
        
                   return Ok(self.cat_rvalue_node(id, span, expr_ty)); 
        
               } 
        
           }

@nikomatsakis what should we do here? I want a static lvalue, with a non-'static lifetime.

#18001, #17954 and #18712 were fixed by #43746.

@alexcrichton @eddyb Do you know any other blockers, or is this ready for stabilization?

There's some emulation clang does IIRC, that we might want to do ourselves, to support #[thread_local] everywhere.

And there's #47053 which results from my initial attempt to limit references to thread-local statics to the function they were created in.

@cramertj I've personally been under the impression that we're holding out on stabilizing this for as long as possible. We've stabilized very few (AFAIK) platform-specific attributes like this and I at least personally haven't ever looked to hard into stabilizing this.

One blocker (in my mind at least) is what @eddyb mentioned where this is a "portable" attribute yet LLVM has a bunch of emulation on targets that don't actually support it (I think MinGW is an example). I don't believe we should allow the attribute on such targets, but we'd have to do a lot of investigation to figure out those targets.

Is there motivation for stabilizing this though? That'd certainly provide some good motivation for digging out any remaining issues and looking at it very closely. I'd just personally been under the impression that there's little motivation to stabilize this other than it'd be a "nice to have" in situations here and there.

I am using #[thread_local] in my code in a no_std context: I allocate space for the TLS segment and set up the architecture TLS register to point to it. Therefore I think that it is important to expose the #[thread_local] attribute so that it can at least be used by low-level code.

The only thing that I'm not too happy about is that Sync is no longer required for #[thread_local]: this makes it more difficult to write signal-safe code, since all communication between signal handlers and the main thread should be done through atomic types.

@Amanieu Signal/interrupt-safe code has other requirements which are not satisfied by current Rust.

@eddyb Not really, all you need to do is treat the signal handler as a separate thread of execution. The only requirement you need to enforce safety is that objects which are accessed by both the main thread and the signal handler must be Sync.

Of course you can still cause deadlocks if the main thread and signal handler both try to grab the same lock, but that's not a safety issue.

@cramertj regarding remaining blockers, I would consider the behavior that @Boiethios reported above needs to be fixed before stabilization. I filed #54901 to follow up.

Copying @gnzlbg's comment from rust-lang/libc#1432

It appears that 1) thread_local! solves the problem for most people, and 2) the main uncertainty is that #[thread_local] isn't portable.

Maybe we could reduce the scope of an initial version of #[thread_local] to extern static declarations. If the target does not support #[thread_local], well then the extern static declaration is incorrect since there cannot be a definition anywhere, and using it would already be UB. We could add on top of this a "best effort" compilation error, e.g., if the compiler knows that the target doesn't support it.

I agree with the above, thread_local! and #[thread_local] for extern static unblocks virtually all use-cases, and would allow interfaces to errno using stable Rust on all platforms (see the linked libc issue for more context).

Right now there is no way (with stable Rust) to use C thread local variables via FFI without writing additional C glue code, see the errno-dragonfly crate for an example of such a (necessary) hack.

cc @joshtriplett: allowing #[thread_local] on extern statics might be something for the agenda of the WG-FFI, since interfacing with errno is kind of an important part of the C FFI puzzle.

@gnzlbg Thanks! Agreed, we definitely need thread-local-variable support.

Independent of the FFI need for extern static...

Like @Amanieu from last year, I'm using #[thread_local] in a no_std context (an RTOS, in my case), with the OS runtime handling management of the TLS pointer and memory. (They and I differ on one point, which is that I'm delighted that #[thread_local] lifts the Sync requirement for static. It seems good and right.)

#[thread_local] is currently the only unstable feature I have to rely on for program correctness. (I'm using a couple others for convenience, but I could lower them by hand if required. I cannot easily replicate the TLS link behavior by hand.)

I haven't dug into the compiler side of this, so this question may be naive, but is the has_elf_tls LLVM target feature not sufficient to gate this? We have a few other language features (if not attributes per se) that are gated by target support -- for example, my platform doesn't have AtomicU64. So it doesn't seem entirely without precedent.

@alexcrichton Any news on #[thread_local] becoming stabilized? AFAIK, at the moment it is impossible on DragonFly to access errno variable from stable code, other than directly from libstd. This blocks crates like nix on DragonFly, which want to access errno as well, but libstd is not exposing it, and stable code is not allowed to use feature(thread_local).

We now provide __errno_location() since this commit:

https://gitweb.dragonflybsd.org/dragonfly.git/commitdiff/60d311380ff2bf02a87700a0f3e6eb53e6034920

The original issue suggests that the thread_local attribute in LLVM isn't supported across all platforms, but has platform support improved in the time since this issue was opened? Or do we expect that it will always remain nonportable? (Is there an LLVM tracking issue?)

Hello, #[thread_local] can currently be applied to fields of a struct without any errors or warnings (it won't work oc):

#![feature(thread_local)]

use std::sync::{Arc, Mutex};

#[derive(Debug)]
struct Foo {
    #[thread_local]
    bar: &'static str,
}

fn main() {
    let foo = Arc::new(Mutex::new(Foo {
        bar: "bar",  
    }));
    
    dbg!(foo.lock().unwrap().bar);
    
    let foo2 = foo.clone();
    std::thread::spawn(move || {
        foo2.lock().unwrap().bar = "baz";
    }).join().unwrap();

    dbg!(foo.lock().unwrap().bar);
}

Outputs:

[src/main.rs:16] foo.lock().unwrap().bar = "bar"
[src/main.rs:23] foo.lock().unwrap().bar = "baz"

By looking at the issue description, #![feature(thread_local)] seems unsound, but it's not caught by the incomplete_features lint:
https://play.rust-lang.org/?version=nightly&mode=debug&edition=2018&gist=0d95cf5246b5b41ddc6776527fdc89e5

Hm, that's fair... OTOH if we made incomplete_features fire we'd have to enable that lint in library/std, so we'd not easily notice when another unsound feature is enabled.

IMO it would be better to just make all accesses to these statics unsafe.

@RalfJung Well, I just read #71435 (comment) after commenting and am wondering whether I should edit and hide my comment or not.

@RalfJung It seems that #49682 is the only soundness problem, so this may already have implementation strategy: banning the case in #49682. Ignore me if so. Sorry for bothering.

Well, I just read #71435 (comment) after commenting and am wondering whether I should edit and hide my comment or not.

Hm, I do not see the relation to be honest (also I was wrong when I posted that -- there is an implementation strategy).

#54366 sounds like it might be a soundness issue if it also happens for #[thread_local] static (without mut).

@RalfJung I think I was confused by confusing comments (not only yours) and probably misread some of them. Sorry. Ignore what I said before.

Hm, that's fair... OTOH if we made incomplete_features fire we'd have to enable that lint in library/std, so we'd not easily notice when another unsound feature is enabled.

IMO it would be better to just make all accesses to these statics unsafe.

core, alloc and stdarch have already used #![allow(incomplete_features)]. Is std special or #![allow(incomplete_features)]s in them are planned to be removed?

Also, I think adding an attribute that suppresses incomplete_features only for specified features could solve the noticing issue generally (also in core, alloc and stdarch).

core, alloc and stdarch have already used #![allow(incomplete_features)]

I'd hope there's work happening to get rid of that. :/ I thought since the move to min_specialization, things were better. I guess this is mostly due to const-generics.
At the very least there should be comments explaining why unsound features are enabled in the very foundation of Rust... but it seems not everyone agrees on sch a policy, seeing that this code was reviewed and landed. Oh well.

Also, I think adding an attribute that suppresses incomplete_features only for specified features could solve the noticing issue generally (also in core, alloc and stdarch).

That would help, yes.

With relocation-model=static compiler uses fs "segment" to access thread local variable like expected, but with pic it uses __tls_get_addr()
Shouldn't it use fs in both situation?

// relocation-model=static
example::f:
        mov     byte ptr fs:[example::FOO@TPOFF+3], 1
        mov     qword ptr fs:[example::BAR@TPOFF], 2
        ret

// relocation-model=pic
example::f:
        sub     rsp, 8
        lea     rdi, [rip + example::FOO@TLSLD]
        call    __tls_get_addr@PLT
        mov     byte ptr [rax + example::FOO@DTPOFF+3], 1
        mov     qword ptr [rax + example::BAR@DTPOFF], 2
        pop     rax
        ret

Link to godbolt

As far as use cases go. All the bare-metal Arm targets seem to support #[thread_local] definitions. Not using #[thread_local] definitions leads to a lot of trouble because it appears there's no easy way, outside of compiler code, to figure out a type's memory layout in time to tell the linker what to do. It's possible to fully emulate TLS, of course, but that could be a lot of overhead for a bare-metal target.

I don't have a problem, personally, sticking to Nightly for this, though.

Edit: I would worry more about these soundness issues in my project, but accessing static muts is already unsafe, so there's a built-in caveat emptor. (Project docs: "Safety: Don't let this reference escape the thread via...")

@Soveu the POSIX/ELF AMD64 ABI has 4 models for accessing Thread-Local Storage. See this doc on Thread-Local Storage models for more information.

Global Dynamic (Section 4.1.6)
- Uses a call to __tls_get_addr() for each global variable access
- Used for extern/pub variables if Rust doesn't know if the code will be statically linked
Local Dynamic (Section 4.2.6)
- Uses a call to __tls_get_addr() to get the current TLS offset, then uses relative offsets for subsequent accesses
- Used for private variables if Rust doesn't know if the code will be statically linked
Initial Exec (Section 4.3.6)
- Uses fs with the GOT and a @GOTTPOFF offset
- Used for extern/pub variables if Rust knows if the code will be statically linked
Local Exec (Section 4.4.6)
- Uses fs without the GOT and a @TPOFF offset
- Used for private variables if Rust knows if the library code be statically linked

Depending on how you end up linking the binary (and if you use LTO) Rust/LLVM might use one of the "Exec" models even if your relocation model is pic. Using the static relocation model just tells LLVM that the code will definitely be statically linked.

This example playground should be able to show all 4 models (if you switch pic to static)

There's also the -Ztls-model Nightly rustc flag, if you want to force a model to be used: https://doc.rust-lang.org/unstable-book/compiler-flags/tls-model.html.

Interesting, I thought TLS is basically a thread-local copy of .tdata and fs holds an offset between .tdata and the copy

btw, is there a way to tell rustc to use gs for thread locals instead of fs?

btw, is there a way to tell rustc to use gs for thread locals instead of fs?

I would also really like this (for Kernels and the like), but I don't think LLVM supports this (as it's not one of the 4 TLS models described above). You would have to get a 5th TLS model (maybe called kernel) added to LLVM to support this.

I heard some rumors that Redox does that, but I can't find how

I heard some rumors that Redox does that, but I can't find how

Looks like they modified their binutils: https://gitlab.redox-os.org/redox-os/binutils-gdb/-/merge_requests/5

I heard some rumors that Redox does that, but I can't find how

Looks like they modified their binutils: https://gitlab.redox-os.org/redox-os/binutils-gdb/-/merge_requests/5

Would have been better if they made this pull request to actual binutils then we all could use it.

Okay, I think I found a way to make things work using just #[thread_local] declarations, instead of full definitions. Only requires some form of asm support. playground

The trick is figuring out how to link the extern declaration to a full definition without LLVM noticing and ignoring the original declaration. The .equiv asm directive seems to accomplish this with minimal voodoo magic.

Would it be possible to partially stabilize this for let's say x86_64 architecture in user space code and by also not supporting problematic types like generators etc?

Marking as impl-incomplete since it may still have soundness issues, and may misbehave on platforms without TLS support.

EDIT: original proposal here was unsound wrt scoped threads, see #29594 (comment)

(click to open the original proposal)

I'm not sure where this should go, and I don't have the time for a complete RFC, but "a lifetime shorter than 'static" came up recently in the context of some rustc internal data structures (which are owned thread-locally, not coincidentally), so:

(with apologies to any previous suggestions similar to this, that may have been dismissed in the past - I couldn't find anything in this very issue, at least)

If we added a 'thread lifetime, like 'static but strictly shorter:

borrowing a #[thread_local] (and even thread_local!) could produce a &'thread T reference
- it should even be returnable from functions (as long as it doesn't leave the thread, see below)
to preserve lifetime parametericity, &'thread T: Send + Sync has to be true if T: Sync (just like with &'a T for any other 'a), and soundness of 'thread usage shouldn't require it
- in fact, it should be entirely possible to pass such a &'thread T "down" to a scoped thread, just like how f(&THREAD_LOCAL) (with #[thread_local]) or THREAD_LOCAL.with(|data| f(data)) (with thread_local!) work today
everything that requires T: 'static today would still not allow anything containing these new 'thread lifetimes, and that includes:
- potentially-detachable (i.e. non-scoped) threads
  - any kind of communication between non-scoped threads is also indirectly affected by that bound, i.e. mpsc::Receiver<T>: 'static requires T: 'static
  - this also extends to thread pools running async tasks: a Future from async fn, that is required to be 'static, cannot keep a &'thread T across a suspension point
- thread_local!'s data type
  - essentially stating "no &'thread T can get from the normal thread execution, into thread_local! destructor execution"
  - a thread_local! destructor could itself obtain a &'thread T reference, but it would be limited in scope to that destructor's execution, since there would be no place available to stash it
  - for this to be sound, I believe #[thread_local] also needs the 'static bound, otherwise it could hold a reference to a thread_local!, that a different thread_local!'s destructor could read back
there are two styles for "getting short-lived references" APIs today:
- fn with(&self, f: impl FnOnce(&T))
  - only option sound today for thread-local Ts, thread_local! uses it
- fn get(&self) -> &T (Deref-like)
  - with owned self, this is unsound because Box::leak(Box::new(self)).get() returns a &'static T, even if self itself is !Send/!Sync
  - this is (incorrectly) used in a few places in rustc, today, and that's where this whole idea came from
- but with 'thread, we could have the best of both worlds, and return &'thread T, to be more flexible than today's sound option (with) while (hopefully?) remaining sound
  - only downside is Deref can't be implemented directly (except on a type that has &'thread T as a field itself)
- a struct with PhantomData<&'thread ()> would make that type never pass a 'static bound check, so even Box::leak(Box::new(self)) would only produce a &'thread Self, not a &'static Self, meaning a Deref-like API would actually remain sound
  - though this sort of thing could complicate the implementation, unless it's only allowed by making the struct lifetime-generic and passing 'thread in that way

Based on the little I remember about how 'static is implemented, there's a good chance of this being easy to fully implement (copy how 'static is handled, but make it strictly "shorter" in the lattice), but I wouldn't bet on it just yet.

EDIT:

while looking around for precedents, I found a full-fledged RFC (rust-lang/rfcs#1705) from 2016, that I had completely forgot about - looking in it now to compare it...
oh, it made the classic mistake of proposing &'thread T: !Send (which is impossible because of lifetime parametericity, see above) instead of relying only on 'thread < 'static:

Any type depending on 'thread (i.e., a type product of the type construction from 'thread) is !Send, and thus bounded to the current thread.
@nikomatsakis caught onto that issue in this comment: rust-lang/rfcs#1705 (comment)
overall I feel like that RFC was not arguing for itself very well - the fn-scoped #[thread_local] limitation we ended up with instead is enough for soundness, and there is no talk about letting the LocalKey API of thread_local! use the 'thread lifetime etc.

@eddyb
I really hope for this feature to be stabilized as soon as possible, but I must ask the following question:

it should be entirely possible to pass such a &'thread T "down" to a scoped thread

Is it guaranteed that the same thread-local pointer stays valid when passed to a different thread? I can imagine a system which maps the same numeric pointer to different physical addresses for different threads via MMU magic.

Is it guaranteed that the same thread-local pointer stays valid when passed to a different thread? I can imagine a system which has the same numeric pointer to point to different physical addresses for different threads via MMU magic.

That is not allowed. Thread-local variables for different threads must have distinct addresses.

Is it guaranteed that the same thread-local pointer stays valid when passed to a different thread? I can imagine a system which has the same numeric pointer to point to different physical addresses for different threads via MMU magic.

That is not allowed. Thread-local variables for different threads must have distinct addresses.

@newpavlov To expand a bit further: such a pointer could not, in Rust, use the type &T, and instead would have to be some kind of wrapper that only produces a &T if it can compute a "global" pointer (i.e. one valid across all threads).

If T: Sync, then &T: Send holds and the pointer can make its way into a different thread, as long as it's not accessed outside of its original scope. Even with a !Sync pointee, I'd still be wary of any further derived reference existing (&Foo doesn't give Foo as much control over the reference as a separate FooRef type would).

This applies to thread_local!'s .with(|short_lived_ref| ...) method today, which doesn't stop you in any way from e.g. spawning some scoped threads and capturing short_lived_ref in them, inside the closure.
(And #[thread_local] statics have their equivalent, where do_anything_with(&THREAD_LOCAL_STATIC) passes the reference "down the stack" to a function that doesn't really see it as any other reference).

Reading back my sketch for for 'thread above (#29594 (comment)), and specifically:

in fact, it should be entirely possible to pass such a &'thread T "down" to a scoped thread,

I had some thoughts about interesting lattice interpretations of 'thread, which is that it's somewhere between 'static and all other stack-related lifetimes within a thread.

With detached threads requiring 'static bounds to pass data between them, each detached thread ends up with a hierarchy of 'static > 'threadX > ... (for some thread X), so 'thread can be seen as a kind of combination (union/intersection aka join/meet, whichever is correct) of all 'threadX.

But that only works for detached threads - for scoped threads you end up with 'threadX lifetimes that can be arbitrarily small, which means 'thread actually becomes isomorphic "the empty lifetime" (i.e. it has to be treated as shorter than any other lifetime, kind of like a "bottom" equivalent) and it's impossible to do almost anything with it, if you're to remain sound.

In fewer words, my original proposal was unsound as stated.

As an example, imagine &'a Cell<&'thread T> to a scoped thread - if it can place its own TLS references in there, those will become invalid when the scoped thread exits.
But if we take the correct interpretation mentioned earlier, that type is illegal, because 'thread is shorter than any lifetime (including 'a) - problem averted, right?

Well but now you can never have let x = &THREAD_LOCAL; because x has a scope that's arguably longer than 'thread (it's shorter than the current thread, but with just one 'thread there's no way to distinguish).
So you're either useful and unsound wrt scoped threads, or sound but useless.

There might be a way to salvage the hope of a 'thread that doesn't need to interfere with Sync/Send at all, which I didn't come up with myself but was suggested to me by @eternaleye:

'thread could become an implicit extra lifetime parameter to all functions. Given @tmandry and @nikomatsakis' discussions around varieties of contextual implicits, it could be seen as a compiler-implied with 'thread.

To limit compilation performance impact and whatnot, it would ideally only be nameable anywhere in the function if it shows up in the signature or where clauses, and calling a function that needs 'thread from one that doesn't, could just use the outermost scope of the caller (effectively "statically known top of the thread", which is appropriate, given that the caller could literally be a thread entry-point for all we know).

For detached threads, we have the same solution: 'thread won't pass the 'static bounds required by e.g. thread::spawn, unless you have a 'thread: 'static bound on the caller - which could either never be satisfiable, or, as an interesting twist, could perhaps be satisfiable inside fn main specifically, encoding in the language that "the main thread lives forever" (and yes this would also mean &THREAD_LOCAL from within fn main would be &'static - could have interesting implications).

For scoped threads, it boils down to "only direct calls to functions pass 'thread" - the amount of dynamism (whether fn pointers or dyn FnOnce) involved forces the new thread's entry-point to effectively be 'thread-polymorphic, and anything using 'thread from the parent thread looks like it could be any stack lifetime.

The earlier example of &'a Cell<&'thread T> would just be a &'a Cell<&'b T>, and &'b T could only be created by the scoped thread using any other &'b references it may have gotten from its parent thread.

In fact, without some with 'thread-like abstraction at the trait impl level, even calling a trait method should probably leave the callee 'thread-polymorphic for now.
So writing 'thread anywhere other than in the signature/where clauses of a free function or inherent impl, should be an error (i.e. a type or trait definition should take an explicit lifetime parameter and not refer to 'thread).

A limited version of this feature could be implemented right away, and because it's mostly just desugaring to lifetime parameters (with only the choice of what's passed for the parameter in direct calls, and the lifetime in &THREAD_LOCAL's type, being 'thread-specific semantics), it's way more likely to be sound.

#[thread_local] causes an Internal Compiler Error if used with proc macro-specific types (like proc_macro::{Delimiter, Group, Ident, Punct, Spacing, Span, TokenStream, TokenTree}). Could we either

document that it's not for proc macro-specific types, or
have a tracking issue for this, please?

#115621

segment fault on windows. is it a misuse of the feature, or a bug?

Would it makes sense to just reject the attribute on targets where target_thread_local is not set?

#[thread_local] causes an Internal Compiler Error if used with proc macro-specific types (like proc_macro::{Delimiter, Group, Ident, Punct, Spacing, Span, TokenStream, TokenTree}). Could we either

This has nothing to do with this tracking issue; it's about the entire proc-macro system being incompatible with thread-local state (no matter how that state got implemented).

This tracking issue is about thread-local state specifically implemented via the native mechanism of the linker (as opposed to something like pthread keys). The thread_local! macro has different implementations depending on the target; sometimes it uses linker-native thread-locals (internally using the feature tracked here), sometimes it uses slower but less fragile run-time OS-provided mechanisms.

I wonder if there's a path towards a minimal thread_local stabilization by being very restrictive?

E.g.:

all access is by value only
only allow Copy types (or maybe even restrict it to only a subset of primitives)
don't allow lifetimes
don't allow composing thread_local with other lang attributes (they may interact in "interesting" ways)
make thread_local error if the platform does not support it. This would imply also stabilizing target_thread_local but maybe the name should be subject to bikeshedding first (e.g. has_static_thread_local) to be clear that it may still have runtime thread locals.

What is the motivation for that? Doesn't thread_local! { static NAME = const { ... } } suffice?

no_std mainly

Does that form of the macro need anything that requires std? Could there be a core::thread_local! { ... } macro that makes that part of the functionality available without std (i.e., it would require const blocks)?

We could make a macro I guess. It'd be pretty redundant though when/if thread_local proper is stabilized. Unless we decide to just have a macro and not the attribute.

The API provided by #[thread_local] static and const-thread_local is pretty different I think. So your proposal does introduce redundancy that we currently don't have. There'd be two stable ways to do the same thing and two completely different mechanisms to ensure the necessary restrictions (such as "no 'static references").

The "two ways to do things" will occur whatever happens unless we simply don't stabilize #[thread_local] static.

That is true. If the usecases are covered I don't see a reason to stabilize #[thread_local] static. It could be transitioned to an internal feature, an implementation detail of the public macros.

core::thread_local! would also need LocalKey moved into core::thread (or an equivalent to it). One thing that thread_local! forces that #[thread_local] doesn't is going through a shared reference. There would also still need to be a way to know if thread_local is supported in no_std.

What is the motivation for that? Doesn't thread_local! { static NAME = const { ... } } suffice?

IIRC generated assembly for thread_local! was quite ugly when compared to equivalent #[thread_local] code. I haven't measured performance impact and do not know if it's possible to work around it with std changes, but it's still an example of non-zero-costness. Also, #[thread_local]-based code simply looks nicer and more ergonomic.

For the new thread_local! { static FOO: ... = const { ... } syntax it should all be optimized down to a minimum. If not then we should really try to fix that.

EDIT: assuming the type of FOO has no drop, of course.

Also, #[thread_local]-based code simply looks nicer and more ergonomic.

If they only work for Copy types and you can't take any references (and hence also not call any &self/&mut self methods), I am not sure if that's still true.

And taking references would be unsound.

And taking references would be unsound.

Borrowck limits those references to the current function already I believe.

Yes. The goal would be to eventually make full #[thread_local] stabilized (once bugs, etc are fixed). But that's not happening any time soon and in the meantime a minimal stabilization would be useful.

Oh interesting, I wasn't aware that borrowck had special treatment for thread_local statics.

I'd be very interested in seeing this stablized.
I use #[thread_local] in the interface code for my WIP OS, as an import/export for thread_local statics (https://github.com/LiliumOS/lilium-sys/blob/main/src/sys/io.rs#L44-L54). The handles here are thread-local (as handles themselves are thread-local resources in the OS) that are initialized by the USI (userspace standard interface) when you create a thread (This specific case is limited to a Copy type and you probably won't be be borrowing the handles often at all).

In general, being able to import an external TLS var without shim code written in C is useful (for example, if you want to grab __errno). This cannot be done with LocalKey and std::thread_local!.
#[thread_local] would also enable using TLS in a no-std context, such as a kernel.

@RalfJung

If they only work for Copy types and you can't take any references (and hence also not call any &self/&mut self methods), I am not sure if that's still true.

How about introducing a pointer-based variant of #[thread_local]? Something like this:

// Creates TLS value which stores 42 and creates "pointer" `FOO` to it.
#[thread_local_ptr]
static FOO: *mut u64 = 42u64;

fn increment_foo() {
    // SAFETY: reference to `FOO`does not escape execution thread
    let foo: &mut u64  = unsafe { &mut *FOO };
    *foo += 1;
}

@newpavlov it seems someone already taught the borrow checker about thread_local so taking references to them is actually sound. That's pretty cool.

I don't like there being this API duplication and inconsistency between that and the thread_local! macro, but that's up to libs-api to figure out.

We'd have to be rather careful where we enable this feature; in the past, I think on some Windows targets we switched back-and-forth between "true" thread-local statics and some other implementation for the thread_local! macro. The macro gives us the flexibility to do that; #[thread_local] does not, so once we allow it somewhere, if we later figure out there is some platform issue then we are in trouble.

I'm slightly bothered by thread-local statics pretending to be statics. They have very little in common with regular static in terms of their semantics. &FOO isn't even a constant, it is a function. We do reflect this in the MIR at least so bugs due to this are hopefully unlikely. And I don't have a proposal for a better syntax either so 🤷

I have a few use cases for #[thread_local] that can't be satisfied by the standard library's thread_local! macro:

The code base is a no_std binary which doesn't depend on libc (it has its own TLS/stack initialization code).
It exports #[thread_local] variables for use by C code (errno).
It used to #[thread_local] variables in inline assembly (by symbol name), although that code has since been refactored and it not longer does that. It is possible to know the exact instruction sequence to use with the symbol because the whole code base is compiled with -Z tls-model=local-exec.

#[thread_local] translates directly to the thread_local attribute in LLVM. This isn't supported on all platforms, and it's not even supported on all distributions within the same platform (e.g. 10.6 doesn't support it but 10.7 does). I don't think this is necessarily a blocker, but I also don't think we have many attributes and such which are so platform specific like this.

What's the status of this? I see a lot of discussion around a hypothetical cfg(target_thread_local), but nothing concrete?

It's not hypothetical, cfg(target_thread_local) exists on nightly. However, historically we have turned this on and off on some platforms to work around various bugs, so we should be careful before just blanket-exposing this on stable.

It used to #[thread_local] variables in inline assembly (by symbol name), although that code has since been refactored and it not longer does that. It is possible to know the exact instruction sequence to use with the symbol because the whole code base is compiled with -Z tls-model=local-exec.

If we just allow asm blocks to reference thread-locals, I worry it will lead to much confusion and errors. A thread-local is not just a normal symbol, after all -- but a Rust programmer might think it is, since in Rust code it behaves much like a static, but that's a sweet lie.

Unfortunately the inline asm docs don't even give an example for how sym is used at all. They do mention this though:

is allowed to point to a #[thread_local] static, in which case the asm code can combine the symbol with relocations (e.g. @plt, @tpoff) to read from thread-local data.

Presumably, it is UB to try to access that symbol without the exactly right set of relocations matching the current target and build flags? Seems like a pretty big footgun.

Presumably, it is UB to try to access that symbol without the exactly right set of relocations matching the current target and build flags? Seems like a pretty big footgun.

It's always safe to use the most general relocations (which involve calling __tls_get_addr). It's the more specific ones like @tpoff which are only valid with certain TLS models (for example local-exec is only valid in executables, not shared libraries).

Using __tls_get_addr requires you to emit a very specific set of bytes and relocations (on x86 this includes redundant prefixes). Getting anything wrong will likely cause tls relaxation by the linker to either error or generate a corrupt binary. And even with the most general relocations you still have to deal with different object file formats using different ways of accessing TLS. We currently don't have any cfg's for the object file format, so doing something that works correct on any OS for a given architecture is not possible.

Just curious, are there any thread-local modes that aren't based on ELF's thread structure system? e.g. on x86_64 will we eventually be able to simply emit instructions that use FS/GS segments directly instead of reading a pointer from a negative offset and de-referencing it?

As far as I can tell that's GNU's/ELF's TLS model that doesn't really translate nicely to #![no_std] code (sans-OS) on x86_64.

	// `#[thread_local]` statics may not outlive the current function.
	for attr in &self.tcx.get_attrs(def_id)[..] {
	if attr.check_name("thread_local") {
	return Ok(self.cat_rvalue_node(id, span, expr_ty));
	}
	}