Tracking issue for the `linkage` feature
aturon opened this issue ยท 26 comments
Tracks stabilization for the linkage
attribute.
Currently this translates to the various linkage models of LLVM, for example the values of the attribute can be:
match name {
"appending" => Some(llvm::AppendingLinkage),
"available_externally" => Some(llvm::AvailableExternallyLinkage),
"common" => Some(llvm::CommonLinkage),
"extern_weak" => Some(llvm::ExternalWeakLinkage),
"external" => Some(llvm::ExternalLinkage),
"internal" => Some(llvm::InternalLinkage),
"linkonce" => Some(llvm::LinkOnceAnyLinkage),
"linkonce_odr" => Some(llvm::LinkOnceODRLinkage),
"private" => Some(llvm::PrivateLinkage),
"weak" => Some(llvm::WeakAnyLinkage),
"weak_odr" => Some(llvm::WeakODRLinkage),
_ => None,
}
Some worries about this are:
- These are very LLVM specific, it's unclear how applicable they are to other backends.
- Beyond
external
orweak
, I've never seen a reason to use the other attributes. - These linkage methods are platform-specific and aren't guaranteed to work everywhere.
That being said, it's the only way to do weak symbols on Linux and it's also convenient for exporting a known symbol without worrying about privacy on the Rust side of things. I would personally want to reduce the set of accepted linkage types and then state "well of course linkage is platform-specific!"
This feature is fundamentally broken. Consider the following code:
static unsigned long B = 0;
unsigned long *A = &B;
void f(void) { }
#![feature(linkage)]
#[link_name = "c_part"]
extern {
#[linkage = "extern_weak"] static A: *const usize;
fn f();
}
fn main() {
unsafe {
f();
println!("{:x} @ {:x} @ {:p}", *(*A as *const usize), *A, A);
}
}
Which prints 0 @ 5617765adaa8 @ 0x5617765ad890
meaning that the A
seen by the rust code contains the address of A
and not A
itself. It's also easy to get LLVM to abort with this.
I think we should at least work on stabilising the weak
and extern_weak
linkage. Both are very useful and a fairly widespread linkage options that seem to be supported (unlike many other options currently exposed) by at least all of the tier 1 targets.
To give an example of a use-case for weak
linkage in Rust code (I need weak linkage for libloading), one would be for global statics unique even between multiple versions of the same crate. Consider that, currently, if a binary links log = ^0.3
and log = ^0.4
some way, or another, it will end up having two distinct global loggers. This could trivially be resolved with some use of the weak
linkage option (as it ensures โ at link time โ only one instance of global with the same name).
That being said, #[linkage]
should always infect whatever it applies to with unsafety. Consider for example these two crates:
// crate older version
#[linkage="weak"]
static mut FOO: u32 = !0;
// crate newer version
#[linkage="weak"]
static mut FOO: char = 'a';
when linked together, all uses of FOO
as seen from the newer-version crate would be UB if the linker happened to choose to link-in the value from the older version. This is functionally a transmute without size checks.
extern_weak is broken in such a dubious way that even though I pointed out two years ago that it was being used incorrectly in the stdlib, the bug still hasn't been fixed.
~$ cat strong.rs
extern {
static __progname: *const u8;
}
fn main() {
unsafe {
println!(" __progname:\t\t{:?}", __progname);
println!(" &__progname:\t\t{:?}", &__progname as *const _);
if !__progname.is_null() {
println!(" *__progname:\t\t{:?}", *__progname as char);
}
}
}
~$ cat weak.rs
#![feature(linkage)]
extern {
#[linkage = "extern_weak"]
static __progname: *mut *const u8;
}
fn main() {
unsafe {
println!(" __progname:\t\t{:?}", __progname);
println!(" &__progname:\t\t{:?}", &__progname as *const _);
if !__progname.is_null() {
println!(" *__progname:\t\t{:?}", *__progname);
if !(*__progname).is_null() {
println!("**__progname:\t\t{:?}", **__progname as char);
}
}
}
}
~$ rustc strong.rs
~$ rustc weak.rs
~$ ./strong
__progname: 0x7ffdfc7438ac
&__progname: 0x7f10560a5360
*__progname: 's'
~$ ./weak
__progname: 0x7f730b21f360
&__progname: 0x5574a33bb008
*__progname: 0x7fffe4ca38b2
**__progname: 'w'
gcc and clang handle this attribute correctly.
PS: The address of __dso_handle isn't actually too significant in __cxa_thread_atexit_impl and &__dso_handle is only a few bytes off.
PSS: Wow, looks like I already explained this above (also two years ago). Maybe the NSA is trying to keep this potential remote code execution unfixed. ๐ค
Ping @alexcrichton @nagisa what's the status here? Is there a bug here that can be solved / mentored?
@cramertj I personally consider this a perma-unstable issue for now. In general symbol visibility and ABIs are something that historically rustc hasn't done much to specify and has had a lot of freedom over. We relatively regularly tweak ABIs, symbol names, etc. There's a very thin layer at the end (like a C ABI) which is pretty stable but even that gets sketchy sometimes (#[no_mangle]
deep in a module hierarchy?).
I think we've benefitted quite greatly from the symbol visibility flexibility we've had historically in terms of compiler refactorings and heading off regressions. It's hard to introduce a regression when you can't rely on the feature in the first place!
Along those lines I think there's definitely some select use cases where using something like #[linkage]
is critical, but from what I've seen they tend to be few and far between and somewhat esoteric. A blanket and general #[linkage]
attribute I think is way too powerful to solve this use case and it'd be better to poke around at various motivational use cases to see if there's a more narrow solution.
(plus that and the whole #[linkage]
is incredibly platform/LLVM specific and I don't think we full understand all the linkage modes in LLVM and how they apply to all platforms as well)
Given that crate owners can't control how many instances of their crate will be included in a given binary, it seems that we really need a mechanism at least for weak linkage in stable Rust.
I got bit by the fallout from #29603 (comment), where rust-libloading gained a cc complation step to workaround this missing feature.
I would love some mechanism to merge statics and variables with the same value (and name, possibly).
Consider the following code:
// In my actual code, this is a more complicated proc macro.
macro_rules! special_number {
($value: expr) => {
{
// In my actual code, these static variables also have
// the special `export_name` of
// "\x01L_special_number_<unique_id>", where
// `<unique_id>` is a unique identifier to avoid
// symbol conflicts.
#[link_section = ".data,__custom_special_section"]
static SPECIAL_NUMBER: usize = $value;
&SPECIAL_NUMBER
}
};
}
extern {
fn consume_special_number(value: &usize);
}
pub fn main() {
unsafe {
consume_special_number(special_number!(42));
consume_special_number(special_number!(42));
consume_special_number(special_number!(42));
};
}
This will generate three different static variables. I would love it if I could get Rust to merge these static variables into one single static variable. Using a const
doesn't work (it does merge the values, but you can't provide link attributes on a const
).
There are two ways the merging could be performed:
- By value. Statics with the same value (that opt-in to merging) would be merged into a single static variable.
- By name. Statics with the same
export_name
(that opt-in to merging) would be merged into a single static variable (and wouldn't result in duplicate symbol errors).
I would be prefer option 2 (merging by name).
Perhaps this is what the linkonce_odr
linkage type is for, but using the same export_name
causes
The linkonce_odr
and weak_odr
linkage types are similar to this, I think, but don't work (in Rust) for merging globals/statics within a single translation unit. Rust could either extend them or introduce a new linkage type that does ODR merging within translation units.
Should we stabilize weak
linkage first? :) It would be useful for building a runtime. We can write complex default function in Rust, export it in weak linkage; if another downstream provided their implementation, the complex default function won't be compiled. That could save code size which is importand in embedded environmant.
We could even implement another #[weak]
lint instead if the original is too llvm-specific.
What work is needed to stabilize #[weak]
in extern "C"
blocks?
I have a couple of ideas that might help stabilize weak linkage.
First of all, the syntax. The problem with weak linkage is that its semantics are "the address of this variable might be null" which is not allowed in Rust, so the compiler puts the real symbol inside the first level of indirection, which is confusing. I propose, instead of an attribute, a type std::ffi::Weak<T>
, which only allowed at the top level on items in extern blocks, like so:
extern "C" {
static foo: Weak<fn(usize) -> *const u8>;
}
This would be special-cased by the compiler in the same way it is now, where the symbol foo
is of type extern "C" fn(usize) -> *const u8
, and the variable foo
is a handle to it, but it would make it much more clear what the actual type of the symbol is, since Weak
is obviously not a C type. The interface of Weak
would be something like
fn as_ref(&self) -> Option<&'static T>;
unsafe fn as_ref_unchecked(&self) -> &'static T;
unsafe fn as_mut(&mut self) -> Option<&'static mut T>;
unsafe fn as_mut_unchecked(&mut self) -> &'static mut T;
Second, the semantics. Instead of "whatever llvm's extern_weak does," Weak
should be defined as:
On platforms that support it, an external symbol of type T and the name of the static item is created. If the symbol isn't present at link time or run time, no error is generated. If the program has loaded a dynamic library that defines the symbol, as_ref
returns a reference to the symbol, otherwise it returns None
. If the platform doesn't support this, a compile-time error is generated.
I think this is the behavior that people actually want, and is supported on Linux, OSX, and Windows. Getting it to work this way requires providing some flags to the linker (-U foo
on OSX and /ALTERNATENAME:foo=null_foo
on Windows), and it would be much easier and less error-prone for the compiler to do this than the programmer.
Marking the overall linkage
attribute as perma-unstable. We should review individual linkages (notably "weak") for stabilization, which may want to occur as a separate attribute or a value of a separate attribute.
Should a separate issue be opened to specifically discuss "weak" linkage (or any other desired linkage attributes), or should discussion of stable linkage attributes continue to be discussed here?
A small rant about weak linkage: It's really two or maybe three separate features stuffed into one. They have the same syntax in GNU extensions to C (__attribute__((weak))
), and they use the same bit in ELF files (though not Mach-Os), but their semantics and use cases are different.
-
A weak reference (LLVM's
extern_weak
) means "this symbol is allowed to not be defined; treat it as null". If the symbol is defined somewhere else, that definition does not have to be weak. On Darwin, this is sometimes used with OS APIs for backwards compatibility with old OS versions (somewhat outdated reference). -
A weak definition (LLVM's
weak
) means "there can be multiple copies of this symbol at (static) link time". Any given symbol name can have any number of weak definitions plus at most one strong definition. If there's a strong definition it wins (except sometimes it doesn't because static library semantics are weird). If there are only weak definitions, an arbitrary-ish one wins. In C++, inline functions are generated as weak since they're guaranteed to have a unique address. On Windows this is known as COMDAT orselectany
. -
On Darwin only, a weak definition also sometimes implies "force the dynamic linker to pick one symbol with this name across all libraries in the process", in contrast to the default behavior where symbols with the same name in different dynamic libraries are independent of each other. (On ELF, the behavior is determined by factors including the presence of
DF_SYMBOLIC
andSTB_GNU_UNIQUE
, but weakness doesn't affect it.) It's not possible to turn this off using a LLVM linkage value.
I find this situation quite confusing. I'm not suggesting we should try to rewrite the terminology that platforms have established, but I think we can at least clearly differentiate between weak references and definitions. For example, in @danielkeller's suggestion, instead of Weak<T>
, we could use WeakImport<T>
or maybe WeakRef<T>
. The linkage
values are already different (weak
versus extern_weak
) but these could stand to have clearer names. And of course, documentation can help.
Yeah, I think that we should avoid conflating these if we ever decide to expose this as a feature. Weak imports behave quite well on darwin (where they're extremely widely used, usually via the macros like __OSX_AVAILABLE
, which switch to a weak import when if your minimum target OS version isn't that recent).
That said, it's not clear what we need to do to make our implementation of them actually work -- #[linkage = "extern_weak"]
on macOS does mark the symbol as N_WEAK_REF
in the MachO, but macOS's linker still gets upset about it unless you also tell rustc to send the linker -U _symbol_that_may_be_weak
(or -undefined dynamic_lookup
, but we probably don't want to do that). Otherwise, while it is a weak symbol, it's not allowed to be undefined.
Oddly, everything actually works without the if the Rust code is built as a static library and linked into an XCode build. I haven't looked into it but this implies that... there's a workaround in XCode for the linker's behavior? If so, this would be pretty hairy, to be honest, so frankly I hope it is not what's happening?
Nope! That was wrong. It turns out that the way it works on apple is that the symbol must exist at link time (e.g. on the host system) or you get the undefined symbol error. This is orthogonal to it being weak reference/import, which just indicates whether it's allowed to resolve. I suppose this might be to save people who try to weakly link against _get_entropy
rather than _getentropy
, and that sort of thing. Tragically, I doubt these are the semantics we'd want for this, since it's too host-specific.
XCode does seem to work around this, using some shenanigans with .tbd
and .map
files, although
For now, can we open up the restrictions of #[linkage]
to allow for Option<fn()>
as well as *const T
/*mut T
? That's the natural way to describe a nullable function pointer. Right now, the only way to extern_weak
a function is by declaring it as *const whatever
and transmute
: https://play.rust-lang.org/?version=nightly&mode=debug&edition=2021&gist=8b9866283aa51360db447f982b9839dd.
Update: the following now works:
extern "C" {
#[linkage = "extern_weak"]
static puts: Option<unsafe extern "C" fn(x: *const u8)>;
}
fn main() {
let str = b"called puts\n\0";
let p = unsafe { puts }.expect("puts isn't linked");
unsafe { (p)(str.as_ptr()) }
}
I propose that the linkage attributes other than "weak" and "extern_weak" should be disabled, since they have no practical usages, and furthermore, some unsupported attributes are causing ICE, such as #109681.
I came up with this idea when I came across #109681 and was trying to solve it. I am wondering whether this is a good idea. If so, I can implement this change.
"weak" linkage is extremely useful in embedded development. Could we get it implemented first?
const fn
s should not be allowed to be weakly linked as their effective bodies would be unknown when evaluating consts: #134451
As @luojia65 and @HaoboGu said previously, weak symbols are very useful for runtimes in embedded applications. We are currently facing issues in riscv-rt
with link-time optimizations and weak symbols in assembly code (247). Is it possible to stabilize #[linkage = "weak"]
for non-const functions, at least? I have never contributed to the Rust core sources, but if I can help with anything, I'll try my best.