rust-lang/rust

Specifying linkage on externs silently removes indirection

jethrogb opened this issue · 3 comments

When compiling the following code:

// externs-c.c
unsigned char myarr[10]={1,2,3,4,5,6,7,8,9,10};
unsigned char (*implicitvar)[10]=&myarr;
unsigned char (*explicitvar)[10]=&myarr;
// externs-rust.rs
#![feature(linkage)]

extern {
    static implicitvar: *const [u8;10];
    // Should have no effect, external linkage is the default in an extern block
    #[linkage="external"]
    static explicitvar: *const [u8;10];
}

fn as_option(p: *const [u8;10]) -> Option<&'static [u8;10]> {
    unsafe{std::mem::transmute(p)}
}

fn main() {
    println!("implicitvar = {:?}",as_option(implicitvar));
    println!("explicitvar = {:?}",as_option(explicitvar));
}

using

clang -c externs-c.c && rustc externs-rust.rs -C link-args=./externs.o

running ./externs will output something like the following:

implicitvar = Some([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
explicitvar = Some([168, 4, 122, 85, 85, 85, 0, 0, 0, 0])

Wat.

Taking a look at the IR:

; externs-c.ll
@myarr = global [10 x i8] c"\01\02\03\04\05\06\07\08\09\0A", align 1
@implicitvar = global [10 x i8]* @myarr, align 8
@explicitvar = global [10 x i8]* @myarr, align 8
; externs-rust.ll
@implicitvar = external global [10 x i8]*
@explicitvar = external global [10 x i8]
@_rust_extern_with_linkage_explicitvar = internal global [10 x i8]* @explicitvar

So, Rust removes a layer of indirection defining static explicitvar: [u8;10] and adding a new variable static _rust_extern_with_linkage_explicitvar: *const [u8;10]=&explicitvar. All mentions of explicitvar in Rust source code get replaced with _rust_extern_with_linkage_explicitvar. This results in the C version and this new Rust version not having the same type! To get “correct” behavior in the example above, you would need to define static explicitvar: *const *const [u8;10] instead.

This weird assymmetry between the types associated with symbols in Rust and in C is a source of great confusion and can easily lead to bugs. In the example above, we just read 2 bytes past some pointer by interpreting it as a 10-byte array.

This weird behavior was introduced in #12556 (see also #11978), the rationale being weak linkage and the fact that some pointers can't be null in the Rust typesystem. While true, I don't think that's sufficient rationale to add this layer of indirection. I think the layer of indirection should be removed completely. For weak linkage, a restriction can be added to allow only zeroable types.

This seems like very strange behaviour to me. I'm not sure if "some pointers can't be null" is a sensible reason for this behaviour at all.

The supported added in #12556 (the indirection here) was only really intended for weak symbols. Weak symbols are currently required to be pointers as otherwise this is not memory safe:

extern {
    #[linkage = "extern_weak"]
    static SYMBOL: extern fn();
}

because fn types aren't nullable. I believe it was intended that this linkage restriction was detected at compile-time in some typechecking pass, although I'm not sure if that was ever implemented.

I guess this bug in particular is about the non-weak pointer case. However, I think the weak pointer case also warrants more discussion, for which I've opened a topic on internals.r-l.o.

Here's the code in question btw: https://github.com/rust-lang/rust/blob/3e9589c/src/librustc_trans/trans/foreign.rs#L139-L145 . It does not distinguish at all between different types of linkage.