Specifying linkage on externs silently removes indirection
jethrogb opened this issue · 3 comments
When compiling the following code:
// externs-c.c
unsigned char myarr[10]={1,2,3,4,5,6,7,8,9,10};
unsigned char (*implicitvar)[10]=&myarr;
unsigned char (*explicitvar)[10]=&myarr;
// externs-rust.rs
#![feature(linkage)]
extern {
static implicitvar: *const [u8;10];
// Should have no effect, external linkage is the default in an extern block
#[linkage="external"]
static explicitvar: *const [u8;10];
}
fn as_option(p: *const [u8;10]) -> Option<&'static [u8;10]> {
unsafe{std::mem::transmute(p)}
}
fn main() {
println!("implicitvar = {:?}",as_option(implicitvar));
println!("explicitvar = {:?}",as_option(explicitvar));
}
using
clang -c externs-c.c && rustc externs-rust.rs -C link-args=./externs.o
running ./externs
will output something like the following:
implicitvar = Some([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
explicitvar = Some([168, 4, 122, 85, 85, 85, 0, 0, 0, 0])
Wat.
Taking a look at the IR:
; externs-c.ll
@myarr = global [10 x i8] c"\01\02\03\04\05\06\07\08\09\0A", align 1
@implicitvar = global [10 x i8]* @myarr, align 8
@explicitvar = global [10 x i8]* @myarr, align 8
; externs-rust.ll
@implicitvar = external global [10 x i8]*
@explicitvar = external global [10 x i8]
@_rust_extern_with_linkage_explicitvar = internal global [10 x i8]* @explicitvar
So, Rust removes a layer of indirection defining static explicitvar: [u8;10]
and adding a new variable static _rust_extern_with_linkage_explicitvar: *const [u8;10]=&explicitvar
. All mentions of explicitvar
in Rust source code get replaced with _rust_extern_with_linkage_explicitvar
. This results in the C version and this new Rust version not having the same type! To get “correct” behavior in the example above, you would need to define static explicitvar: *const *const [u8;10]
instead.
This weird assymmetry between the types associated with symbols in Rust and in C is a source of great confusion and can easily lead to bugs. In the example above, we just read 2 bytes past some pointer by interpreting it as a 10-byte array.
This weird behavior was introduced in #12556 (see also #11978), the rationale being weak linkage and the fact that some pointers can't be null in the Rust typesystem. While true, I don't think that's sufficient rationale to add this layer of indirection. I think the layer of indirection should be removed completely. For weak linkage, a restriction can be added to allow only zeroable types.
This seems like very strange behaviour to me. I'm not sure if "some pointers can't be null" is a sensible reason for this behaviour at all.
The supported added in #12556 (the indirection here) was only really intended for weak symbols. Weak symbols are currently required to be pointers as otherwise this is not memory safe:
extern {
#[linkage = "extern_weak"]
static SYMBOL: extern fn();
}
because fn
types aren't nullable. I believe it was intended that this linkage
restriction was detected at compile-time in some typechecking pass, although I'm not sure if that was ever implemented.
I guess this bug in particular is about the non-weak pointer case. However, I think the weak pointer case also warrants more discussion, for which I've opened a topic on internals.r-l.o.
Here's the code in question btw: https://github.com/rust-lang/rust/blob/3e9589c/src/librustc_trans/trans/foreign.rs#L139-L145 . It does not distinguish at all between different types of linkage.