rust-lang/rust

repr(simd) is unsound

Closed this issue ยท 68 comments

The following should be discussed as part of an RFC for supporting portable vector types (repr(simd)) but the current behavior is unsound (playground):

#![feature(repr_simd)]
#![feature(target_feature)]
#![allow(non_camel_case_types)]

// Given a SIMD vector type:
#[derive(Debug)]
#[repr(simd)]
struct f32x8(f32, f32, f32, f32, 
             f32, f32, f32, f32);

// and the following two functions:

#[target_feature = "+avx"]
fn foo() -> f32x8 { f32x8(0.,1.,2.,3.,4.,5.,6.,7.) }  // f32x8 will be a 256bit vector

#[target_feature = "+sse3"]
fn bar(arg: f32x8) {  // f32x8 will be 2x128bit vectors
    println!("{:?} != f32x8(0, 1, 2, 3, 4, 5, 6, 7)", arg);
    // prints: f32x8(0, 1, 2, 3, 6, 0, 0, 0) != f32x8(0, 1, 2, 3, 4, 5, 6, 7)
}

// what are the semantics of the following when
// executing on a machine that supports AVX?
fn main() { bar(foo()); }

Basically, those two objects of type f32x8 have a different layout, so foo and bar have a different ABI / calling convention. This can be introduced without target_feature, by compiling two crates with different --target-cpus and linking them, but target_feature was used here for simplicity.

Two thoughts I've had in the past how to fix this:

  • We can forbid this in trans I think by just making it a hard error. Basically when this happens give up and say "it's still your problem to fix it"
  • The compiler could automatically generate a shim to "do the right thing". I explained this a long time ago as well but the general idea is that &T always has the same ABI regardless of target_feature, and the compiler could abuse that.

Given that #[target_feature] is unsafe though we could also just add it to the list of contracts you have to uphold to call the function. In that it's just one more checkbox to check when working with the unsafe functions, ensuing that you call them with the same target_feature set if you pass SIMD arguments.


example of unsafety in C++

example in Rust

It's worth noting that, when talking about LLVM features (which rust target features currently map directly to) this also affects floats (which are stable). I.e., on x86 with current safe rust if you compile one crate with --target-feature=+soft-float" and one without you have an issue. This can also be solved as Alex mentions though.

@alexcrichton I would prefer to start with a hard error, and if we need it, add a way to opt-in to the shim generation (*).

The only thing that concerns me about the hard error, is that we will probably need to emit this during monomorphization. I think that this is not acceptable, and we should only do this if it's either temporary or there is no other way. @eddyb pointed out that currently stable rust has no monomorphization-time errors, so this solution might block stabilization.

@alexcrichton you mentioned that this would mean that SIMD types must be then banned from FFI because we don't know the calling convention of the caller. Could you elaborate on this? As I see it, FFI is already unsafe, so it would be up-to-the-user to make sure that the callee is using the appropriate calling convention.

(*) I haven't thought this through, but I imagine getting a hard error for a particular call site, and then wanting to opt-in for that particular call site only, into the shim generation. Anyways, we don't need to think this all the way through now.

I'd personally be totally ok with a hard error, but yes I think we'd have to do it during monomorphization. It's true that we don't have many monomorphization errors today but I don't think we have absolutely 0, and I'd personally also think that we should at least get to a workable state and try it out to evaluate before possibly stabilization. I, personally again, would be fine mostly likely stabilizing with monomorphization errors.

@alexcrichton you mentioned that this would mean that SIMD types must be then banned from FFI because we don't know the calling convention of the caller. Could you elaborate on this?

Oh right yeah! Right now we've got a lint in the compiler for "this type is unsafe in FFI", and for example it lints about bare structs that are not #[repr(C)] and things like String. We'd just want to make sure the lint warned about all SIMD structures as well (that they're not accidentally considered allow-by-default).

@parched I don't believe we're considering a #[target_feature] attribute for soft-float and the #[target_feature] is a whitelisted list of attributes which today go straight to LLVM but we aren't bound to always doing so. I'd imagine this would be an excellent topic of discussion were we to stabilize such a target feature!

eddyb commented

@alexcrichton I've recently reviewed all non-ICE errors from librustc_trans and the only one that can be hit from stable is monomorphization recursion limit. Anything else would be a bug IMO.

While +soft-float may not be considered a valid argument for #[target_feature], disabling SSE does affect the ABI of floats. That is, this call would require a shim as well:

pub fn bar() {
    foo(0.0);
}

#[target_feature = "-sse,-sse2"]
pub fn foo(x: f32) -> f32 {
    x + x
}

The above code actually crashes in LLVM on the playground, because that's x86_64 where SSE2 is always present. However, on a 32 bit x86 target, bar passes the argument in xmm0 while foo loads its argument from the stack.

How about we make the target spec define the ABI regardless of what extra features are enabled or disabled by attributes or the commandline. That would avoid the need for shims and monomorphization errors. So for example on x86_64 which has 128-bit vectors by default:

  • -sse would be a hard error on the commandline or as an attribute on any function that has 128bit (and float?) parameters
  • 256-bit vectors would be passed on the stack regardless of whether +avx is enabled. I don't think there would be a performance issue with this, because anywhere it mattered should be inlined.
  • If some functions require an ABI where 256-bit registers are used, e.g., some intrinsics probably. Then we add another ABI to explicitly tag functions with e.g., extern "vector-256". Calling one of these from a function without +avx attribute would be a hard error regardless of whether +avx was enabled for the whole crate via the commandline.

@rkruppe Thanks for that example. Two small comments:

  • Solving this particular problem is not a blocker for allowing SIMD (and friends) on stable Rust. The target_feature RFC does not allow disabling features. To introduce this problem one would need to do so at the -C target-feature level, and link two crates one with +sse and one with -sse (and this is a problem we already have, with basically any feature).

  • The most immediate goal is allowing SIMD on stable Rust. Ideally, whatever solution we implement first will solve this problem, but a solution that does not fix it can still be good enough as long as it doesn't prevent us from fixing this in the future.

I personally wouldn't like to have to re-open this topic in the future when people start filling bugs due to random segfaults because some crate in the middle of their dependency graph decided that it was a good idea to add -sse in their build.rs script... So, I think that we should try hard to come up with a solution that fixes all these ABI issues once and for all.

@parched some x86_64 targets already have 512bit vectors. What do we do when AVX3 or 4 are released with support for 1024bit vectors? Where does that path end?

Ideally, if I have an SSE dynamic library that exposes some functions on its ABI for SSE...AVX2, I would like to be able to add some new AVX3/4 functions to its interface, recompile, and produce a library that is ABI compatible with the old one, so that all my old clients can continue to work as is by linking to the new library, but newer code is able to call the new AVX3/4 functions. That is, adding those new AVX3/4 functions should not break the ABI of my dynamic library as long as my global target is still sse.

@parched some x86_64 targets already have 512bit vectors. What do we do when AVX3 or 4 are released with support for 1024bit vectors? Where does that path end?

@gnzlbg yes 512-bit and 1024-bit vectors would have to be treated the same way but I don't believe adding more would be an issue.

Ideally, if I have an SSE dynamic library that exposes some functions on its ABI for SSE...AVX2, I would like to be able to add some new AVX3/4 functions to its interface, recompile, and produce a library that is ABI compatible with the old one, so that all my old clients can continue to work as is by linking to the new library, but newer code is able to call the new AVX3/4 functions. That is, adding those new AVX3/4 functions should not break the ABI of my dynamic library as long as my global target is still sse.

For that case you would just have to make your new functions extern "vector-1024" and users wouldn't be allowed to call it unless their caller had #[target_feature = "+avx4"] . (EDIT: To be clear, you would only have to do this if your new function had 1024-bit parameters and you wanted to force them to be passed in registers rather than the stack. If you just used the default extern "Rust" 1024-bit vectors would be passed on the stack and so code would be allowed to call this function)

@parched I think I misunderstood your comment then.

Then we add another ABI to explicitly tag functions with e.g., extern "vector-256".

Do you think that #[target_feature = "+avx"] could do this automatically?


@alexcrichton @BurntSushi I've slept over this a bit, and I think the following is a common idiom that we need to enable:

#[target_feature = "sse"]
fn foo(v: f32x8) -> f32x8 {
  // f32x8 has SSE ABI here
  let u = if std::host_feature(AVX) { 
      foo_avx(v)  // mismatched ABI: hard error (argument)
      // mismatched ABI: hard error (return type)
  } else {
      /* SSE code */
  }
  /* do something with u */
  u
}

#[target_feature = "avx"]
fn foo_avx(arg: f32x8) -> f32x8 { ... }

Here we have some mismatching ABIs. I am still fine with making these mismatching ABIs hard errors as long as there is an opt-in way to make this idiom work. What do you think about using as to cast between ABIs ?:

#[target_feature = "sse"]
fn foo(v: f32x8) -> f32x8 {
  // f32x8 has SSE ABI here
  let u = if std::host_feature(AVX) { 
      // foo_avx(v) // ERROR: mismatched ABIs (2x arg and ret type)
      // foo_avx(v as f32x8) // ERROR: mismatched ABIs (1x ret type)
      foo_avx(v as f32x8) as f32x8 // OK
  } else {
      /* SSE code */
  }
  /* do something with u */
  u
}

That is, an as cast to f32x8 inserts the shims that @alexcrichton was proposing above to make this work. I thought about using as #[target_feature = "sse"] f32x8 or similar but that seemed unnecessarily verbose. I also thought about making it just work, but the shims do introduce a cost, so I think it is better to make this cost explicit.

Do you think we can extend this to make function pointers work?:

#[target_feature = "+sse"] fn foo(f32x8) -> f32x8;
static mut foo_ptr: fn(f32x8) -> f32x8 = foo;

unsafe {
  // foo_ptr = foo_avx; // ERROR: mismatched ABI
  foo_ptr = foo_avx as fn(f32x8) -> f32x8; // OK
}

// assert_eq!(foo_ptr, foo_avx); // ERROR: mismatched ABIs
assert_eq!(foo_ptr, foo_avx as fn(f32x8) -> f32x8); //  OK

I was thinking that in this case, foo_avx would need to be wrapped into a function that inserts the ABI conversion shims. I was worried that doing this would introduce other issues: foo_ptr = foo_avx; assert_eq(foo_ptr, foo_avx); // FAILS because the address of foo_ptr and the address of foo_avx, but this will be an ABI mismatch error that won't compile, and doing the cast would compare the address of the wrapped function which would return OK.

I think that pursuing this would require us to track the ABI of repr(simd) types alongside their type, that is, f32x8["avx"] != f32x8["sse"] but f32x8["sse"] == f32x8["sse3"] (because their ABIs are identical. Tracking the ABI of repr(simd) types alongside the type looks like a lot of implementation work. @eddyb Do you think that "something like this" could allow us to lift the monomorphization-time errors to type-checking?

I think that if we can lift these errors to type-checking:

  • that would fix this issue, because the unsound cases become type errors
  • it would allow users to opt into an extra run-time cost to gain some flexibility (passing types across ABIs)
  • it would fix @rkruppe's example because -sse (and soft-float) would effectively change the types of the floats, producing type errors
  • it allows making function pointers work for different ABIs (and function pointers are a common idiom for dispatching to different SIMD implementations).

Thoughts?


EDIT: even if we never stabilize repr(simd) and only expose some f32x8 like types in std, this is slowly turning into RFC material...


EDIT2: That is, this issue would be resolved by making the original example fail with a type error, and adding the as conversions (or similar) would need to be a subsequent step.

@gnzlbg

The target_feature RFC does not allow disabling features. To introduce this problem one would need to do so at the -C target-feature level, and link two crates one with +sse and one with -sse

For the record, that's not true, one just needs a target that doesn't have SSE enabled by default (or defaults to soft-float), such as the (tier 2) i586-* targets.

I do agree that we should find a proper solution right now, especially since the "cheap fixes" that I'm aware of (monomorphization-time error, or strongarm the ABIs into being compatible by explicitly passing problematic types on the stack) permit code that probably wouldn't work unmodified under a more principled solution. Unfortunately I don't have the time to dive into solutions right now, so I can't contribute anything but nagging at the moment :trollface:

@gnzlbg specifically you think that passing arguments like f32x8 is common enough to warrant ABI compatibility in one way or another? Presumably this doesn't happen at all in C because of the same ABI problem, right?

I'm also not sure we can ever get function pointers to "truly work" unless we declare "one true ABI" for these types, otherwise we have no idea what the actual abi of the function pointer is.

(this bit about function pointers is pushing me quite a bit into the camp of "just declare everything unsafe and document why")

Presumably this doesn't happen at all in C because of the same ABI problem, right?

Of course it happens, see this SO question, but users get warnings and undefined behavior pretty quickly and learn to work around this (that is, "don't do that", pass a float* and a size, manually write the shims you proposed using assembly, etc.).

An important point is that this can only happen when you have ABI incompatible vector types. That is, if you are using from SSE to SSE4.2, then you never run into these issues, because they are introduced by AVX which is relatively recent, and by AVX512 which is very rare (EDIT: on ARM you only have NEON so this does not happen, and the new SVE completely works around this issue).

I'm also not sure we can ever get function pointers to "truly work" unless we declare "one true ABI" for these types, otherwise we have no idea what the actual abi of the function pointer is.

Why do we need one true ABI for these types? For example:

#[target_feature = "+sse"]
fn foo() {
  let a: fn(f32x8) -> f32x8;  // has type fn(f32x8["sse"]) -> f32x8["sse"]
}

#[target_feature = "+avx"]
fn bar() {
  let a: fn(f32x8) -> f32x8;  // has type fn(f32x8["avx"]) -> f32x8["avx"]
}

static a: fn(f32x8) -> f32x8;  // has type fn(f32x8["CRATE"]) -> f32x8["CRATE"]
// where CRATE is replaced with whatever feature the crate is compiled with

That is, two function pointers, compiled on different crates, or functions, with different features, would just be different types and generate a type error.

@alexcrichton

Another workaround in C is to do something like this:

First we need a way to merge two 128bit registers into a 256bit register (or a "no op" in SSE):

#[target_feature = "+sse"]
fn merge_sse(x: (f32x4, f32x4)) -> f32x8;  // no op?
#[target_feature = "+avx"]
fn merge_avx(x: (f32x4, f32x4)) -> f32x8; 
// ^^^^ copy 2x128bit registers to 1x256register

then we need its inverse, that is, a function that takes a 256bit value (or two in SSE) and returns 2 128 bit registers:

#[target_feature = "+sse"]
fn split_sse(f32x8) -> (f32x4, f32x4); // no op?
#[target_feature = "+avx"]
fn split_avx(f32x8) -> (f32x4, f32x4);
// ^^^^ copy the parts of a 256bit register into 2x128bit registers

then we add some macros to communicate f32x8s from AVX to SSE and vice-versa using only 128bit registers:

macro_rules! from_sse_to_avx { ($x:expr) => (merge_avx(split_sse($x)) }
macro_rules! from_avx_to_sse { ($x:expr) => (merge_sse(split_avx($x)) }
macro_rules! from_sse_to_avx_and_back { 
    ($f:expr, $x:expr) => (from_avx_to_sse!($f(from_sse_to_avx!($x)))) 
}

and then we can safely write the code above as:

#[target_feature = "sse"]
fn foo(v: f32x8) -> f32x8 {
  // f32x8 has SSE ABI here
  let u = if std::host_feature(AVX) { 
      // foo_avx(v)  // mismatched ABI: hard error (argument)
      from_avx_to_sse_and_back!(foo_avx, v); // OK
  } else {
      /* SSE code */
  }
  /* do something with u */
  u
}

#[target_feature = "avx"]
fn foo_avx(arg: f32x8) -> f32x8 { ... }

Then we add another ABI to explicitly tag functions with e.g., extern "vector-256".

Do you think that #[target_feature = "+avx"] could do this automatically?

It could, but if you did that you wouldn't be able to call that function from another without #[target_feature = "+avx"], i.e., your runtime dispatch function.

@gnzlbg everything you're saying seems plausible? You're thinking that passing types like f32x8 needs to work in more scenarios, and I'd naively feel like that's not true (in that idiomatic C already doesn't do it), but I'm willing to defer to you. There's a whole smorgasboard of things we can do to get this working, but I'm still personally in camp "let's call it all unsafe and call it aday"

@alexcrichton the target_feature RFC allows#[target_feature] on unsafe fns only.

"let's call it all unsafe and call it aday"

What exactly do you propose to call/make unsafe? (*)

After exploring all these options, I'd like to propose a path forward.

  • the current implementation is unsound and results in UB at run-time (we are here)
  • make the current implementation sound by producing hard errors for ABI mismatches at monomorphization time (I don't think this cuts it for stabilization)
  • make the implementation produce a hard error for ABI mismatches during type checking (this should cut it for stabilization)

@eddyb said above that stable Rust has zero monomorphization time errors. I don't think that introducing one will cut it for stabilization. To produce a type-checking error, we need to "somehow" propagate the target_feature ABI of repr(simd) types along with the types themselves. Don't we need to do this as well for producing monomorphization time errors?

Once we are there users that want to convert between incompatible ABIs can do so with this idiom, which can also be extended to make function pointers work. We could provide procedural macros in a crate that do this automatically, and if that becomes a pain point in practice we could re-evaluate adding language support for that (e.g. something along the lines of the as f32x8 solution).

(*) If I understand this correctly, we would need to require that all repr(simd) types (and types containing those types) can only be used from unsafe code, but I am probably misunderstanding something.

@gnzlbg what I mean is that yes, #[target_feature] forces a function to be unsafe. Then if you call that function you must have some other not compiler-verified mechanism to know why it's safe. When calling a #[target_feature] function then there's simply an extra guarantee you must adhere to which is that it's ABI-safe to call the function. Namely you don't pass any arguments that change ABI based on the feature or you yourself are tagged with the right #[target_feature] to have the same ABI.

I personally think that at this point it's not worth trying to push this back into typechecking. That sounds like quite a lot of work for not necessarily a lot of gain. Additionally I'd be worried that it'd expose hidden costs and/or complexities that are very difficult to get right. For example if we had:

static mut FOO: fn(u8x32) = default;

#[target_feature = "+avx2"]
unsafe fn bar() {
    FOO = foo;
}

#[target_feature = "+avx2"]
unsafe fn foo(a: u8x32) {
}

unsafe fn default(a: u8x32) {
}

fn main() {
    bar();
    FOO(Default::default());
}

How do we rationalize that? Does the bar function generate a shim to put a different ABI into the global? Do we give errors for some compilation flags and not others?

Note that this doesn't even start to touch on trait methods. For example how do we also rationalize Default::default? Is that tagged with the necessary feature to return u8x32? Do we generate a specialized version for each invocation?

These just seem like really difficult questions to answer and I'm not personally sold on there being any real benefit to going to all this effort to statically verify these things. All of SIMD is unsafe anyway now, so trying to add static guarantees on top of something that we've already declared as fundamentally unsafe would be nice but to me doesn't seem necessary.

I'll try to explain what I am proposing better because I think we are misunderstanding each other.

First, we need to declare the repr(simd) type somewhere, for u8x32 this looks like this:

#[repr(simd)] 
struct u8x32(u8, u8, ...);

but what I am proposing is not to make repr(simd) types concrete, but make them parametric on the ABI of the current context (e.g. what target features are enabled). So even though the user writes the code above, the compiler actually treats it like this:

#[repr(simd)]
struct<ABI> u8x32(u8, u8, ...);

When the user uses a repr(simd) type position, the compiler automatically inserts the ABI type. Let's show what I mean with your example. First, the set of target features enabled for a crate sets a global type parameter CRATE_ABI like this (the user does not see any of this, is implicitly done by rustc):

type CRATE_ABI = /* from target features of the crate */; 

Now we proceed with the example. First, the user writes:

static mut FOO: fn(u8x32) = default; // OK, compiles
fn default(a: u8x32) { }

That compiles and type checks, because implicitly, the code looks like this:

static mut FOO: fn(u8x32<CRATE_ABI>) = default; // OK, compiles
fn default(a: u8x32<CRATE_ABI>) { }

So the types unify just fine. Note that default does not need to be unsafe. Users can use u8x32 in normal rust functions just fine. Some library can implements traits for SIMD types that implement safe operations, so doing linear algebra on them can be done in safe rust code (this is the end goal).

Now let's get a bit more messier. The user writes:

#[target_feature = "+avx2"] unsafe fn foo(a: u8x32) {}
#[target_feature = "+avx2"]
unsafe fn bar() {
    FOO = foo;
}

but what this does is the following:

#[target_feature = "+avx2"] unsafe fn foo(a: u8x32<AVX2_ABI>) {}
#[target_feature = "+avx2"]
unsafe fn bar() {
    FOO = foo; // OK or Error?
}

So is this code ok or is in an error? From the information provided, we cannot say. It depends on what the CRATE_ABI type is. Does foo: fn(u8x32<AVX2_ABI>) unify with FOO: fn(u8x32<CRATE_ABI>) ? Well if CRATE_ABI = AVX2_ABI the answer is yes, and this code will type check and be correct, but if it is SSE_ABI or something else, then probably not, and this code will produce a type error. On x86, NO_SSE, SSE, AVX, AVX512 might be enough.

So at this point we are ready to move to trait methods. What should this do?

fn main() {
    FOO(Default::default());
}

Well the same thing it does for any other type. It is just type-checking at work. If it can unify the type parameters then everything is ok, and otherwise, it does not compile. Obviously we need to nail the ABI types so that code only compile when it is safe, and breaks otherwise.

How do we rationalize that? Does the bar function generate a shim to put a different ABI into the global? Do we give errors for some compilation flags and not others?

I hope this has become clear, but just to be crystal clear: we never produce shims, either the ABI matches, or it doesn't. The users can manually write the shims if they need to by using this idiom.

For example how do we also rationalize Default::default?

I hope this has become clear.

All of SIMD is unsafe anyway now

I think I am misunderstanding what you mean here.

Right now, #[repr(simd)] types are safe to use, they are portable vector types after all. Some operations on them might be unsafe, but you can use them in non unsafe functions just fine.

Also, how are you exactly proposing to make repr(simd) types be unsafe to use? It sounds to me that you want to make the types themselves unsafe, but Rust does not have unsafe types. This feature has been discussed before but adding it to the language looks like an even bigger unknown to me at this point. So could you maybe expect how you think this would work?

@gnzlbg So if I understand correctly, you would extend the type system in a way that would be visible to users in type mismatches? Would this also apply to float types?

Okay, thanks.

My impression after only very little thought is that I agree with @alexcrichton that a proper automatic solution to the ABI problems seems pretty complicated. If functions tagged with #[target_feature] are indeed always unsafe, then that seems sufficient for avoiding problems in safe code. So I'm partial to leaving it at that.

Letting unsafe code do its thing would mean that a strict solution such as the one @gnzlbg proposed can't be introduced later. However, it would still be possible to start generating shims based on the caller's and callee's set of target features.

I want to add, though: ABI mismatches are potentially very subtle and annoying bugs, so there should be a warning at least for the cases that can be easily detected statically.

@gnzlbg I think that makes sense, yeah, but I can't really comment on whether I'd think it's feasible or not. It sounds pretty complicated and subject to subtly nasty interactions, my worry would be that we'd spend all our effort chasing along tail of bugs to make this airtight. Is this really a common enough idiom to warrant the need to provide a static error instead of discovering this at runtime?

Also, how are you exactly proposing to make repr(simd) types be unsafe to use?

I haven't though too too hard about this, admittedly. If we expose APIs like Add for u8x32 then our hands may be tied in this regard, but we could also not do that and just expose unsafe fn u8x32::add(u8x32, u8x32) -> u8x32 which is less ergonomic to call but "does the right thing" in terms of ABIs.

@alexcrichton Why would <u8x32 as Add>::add be an issue? It's a safe method, yes, but it wouldn't have a #[target_feature] annotation and therefore match the crate-wise default ABI. The issues only arise if code that does use #[target_feature] uses that Add impl, but that's the fault of the unsafe function using #[target_feature]. A footgun to be sure, but we already have that problem with user-defined functions that handle types whose ABI is sensitive to target features. Unless you want to make any and all mention of those types in function signatures unsafe, the users of #[target_feature] will just have to suck it up and be careful (hopefully aided by a lint, as mentioned previously).

Edit: On second thought, since the functions that use #[target_feature] are already unsafe, they could already use other unsafe functions such as u8x32::add without error or even warning, negating any "linting benefit" the unsafety may have.

And of course, there's the issue that f32 and f64 already implement Add (and countless other traits, and are used by countless other std APIs) and their ABI is also affected by #[target_feature]. That makes it pretty much impossible to make usage of target_feature-sensitive types unsafe.

Ok sorry I was just trying to think of an example. Honestly nothing is concrete enough I feel to debate or hypothesize about, at this point we're going to benefit from something being implemented and something happening, that way we can deal with concrete examples of what's happening and go from there.

We could at least "automatically shim" ABI mismatches when calling a fn item - in that case we always know the destination ABI (if the destination ABI contains "new" registers, we might have to create a "dummy function" to convince LLVM not to move the new instructions around if-statements, but there's no codegen need to it).

When converting an fn item to an fn pointer (or equivalently, to an Fn trait object), I think the Right Thing would be to create an fn pointer with the default ABI, that does the necessary shimming. This is "obviously" what we want to do for a trait object, but less so for function pointers.

The disadvantage is that this might be surprising to people who want high performance with function pointers. Maybe we should make nonstandard-ABI fn item -> fn pointer conversions unsafe instead?

What about this case?

// crate A: -C target-feature=sse
fn foo() -> u8x32;

// crate B: -C target-feature=avx
fn bar(u8x32);

// crate C -C target-feature=sse
extern crate A;
extern crate B;
fn baz() {
  B::bar(A::foo());
}

Neither unsafe nor #[target_feature] are used anywhere, but it is unsound.

I don't see a way around "tagging" repr(simd) types usage with the ABI they are being used with if we want to produce "accurate" errors.


@arielb1 if the ABI errors result in any kind of compile-time error (not necessary a type error), the unsoundness is gone. The users can then manually write the shims.

The question then becomes how do we make this more ergonomic? We can automatically insert shims, or we can provide an explicit cast for users to opt into them being inserted. But this is just an ergonomic improvement, it does not allow anything new.


@alexcrichton @rkruppe the logic required to detect where the shims and what shims must be added is the same logic required to solve the unsoundness issue by making it an error. I think it is better to start there and then re-evaluate.

@gnzlbg

What about this case?

Don't we store these things in metadata? I think we should prevent linking together crates created with different ABIs. There are quite some ways you can cause a mess with that.

The logic required to detect where the shims and what shims must be added is the same logic required to solve the unsoundness issue by making it an error.

The problem with that is that it creates monomorphization-time errors, which are something we like to avoid. Unless we do something more violent, like preventing all cross-target-feature calls where the ABI might be unclear (and making fn() -> f32x4 {nonstandard target feature} not implement Fn() -> f32x4).

@alexcrichton

Is this really a common enough idiom to warrant the need to provide a static error instead of discovering this at runtime?

How do you propose to discover these errors at run-time? Once the user calls the function with the wrong ABI, the behavior is undefined. Given that the target_feature RFC requires run-time feature detection, we could have an instrumented build that, at run-time, checks the features of the target CPU and panic!s before undefined behavior on an ABI mismatch can be invoked. I think we all agree that a static solution would be better, but something like this might be doable as well.

Also, how are you exactly proposing to make repr(simd) types be unsafe to use?

I haven't though too too hard about this, admittedly. If we expose APIs like Add for u8x32 then our hands may be tied in this regard, but we could also not do that and just expose unsafe fn u8x32::add(u8x32, u8x32) -> u8x32 which is less ergonomic to call but "does the right thing" in terms of ABIs.

I see. I agree with you in all what you say. I guess we just need to figure out what the "right thing" is.


@arielb1

Don't we store these things in metadata? I think we should and prevent linking together crates created with different ABIs. There are quite some ways you can cause a mess with that

I don't know, but if we don't store these in metadata we should, and we should check this before linking, and fail.

The problem with that is that it creates monomorphization-time errors, which are something we like to avoid.

I think that having a monomorphization-time error is better than adding an implicit run-time cost in performance sensitive code (otherwise, why is the user bothering with simd?). The consensus seems to be that lifting the monomorphization-time errors to type-checking would be the ideal solution, but as @rkruppe and @alexcrichton correctly point out, doing this is not trivial, and the number of users that will run into these ABI issues is probably very small, which means it might not be worth the effort.

I think it would be great to first start by generating monomorphization time errors. It is the simplest actionable task that fixes the soundness issue, and once we are there, we will be in a better place to re-evaluate the tradeoffs of the 3 solutions that have been discussed here: lifting the errors to type checking, implicitly adding shims, or turning these into run-time errors.

One annoying thing with this is that it strikes closures:

#[target_feature = "+avx2"] 
fn call_via_vtbl(f: &Fn() -> u8x32) -> u8x32 {
    f()
}

#[target_feature = "+avx2"] 
fn call_via_generic<F>(f: &F) -> u8x32
    where F: Fn() -> u8x32
{
    f()
}

#[target_feature = "+avx2"]
unsafe fn foo(a: u8x32) -> u8x32 {
    call_via_vtbl(&|| a);
    call_via_generic(&|| a);
}

If we go by the standard "type-system" rules, the vtable entry in &Fn() -> u8x32 must have the "standard" ABI, and therefore call_via_vtbl will cause a monomorphization error/require a pair of shims. OTOH, people expect trait object calls to be somewhat slow, so I'm not sure they would be so annoyed by that.

With call_via_generic the situation is theoretically similar, but if we are smart enough we can avoid the unneeded pair of shims (because there is no vtable).

EDIT: this is wrong, I misunderstood what @arielb1 was pointing out.

@arielb1

That example is sound and does not require any shims. There is only one ABI in use, the AVX abi (its the same as AVX2 for u8x32).

If we go by the standard "type-system" rules, the vtable entry in &Fn() -> u8x32 must have the "standard" ABI,

u8x32 is a portable vector type, it has the ABI of the context where it is used (this is per LLVM). Here, &Fn() -> u8x32 is used in an AVX context, so it has the AVX abi. That is, your foo function is correct, and one does not need any shims. However, this should error:

#[target_feature = "+sse2"]
unsafe fn bar(a: u8x32) -> u8x32 {
    call_via_vtbl(&|| a); // ERROR: 
    // ^^ closure &|| a : Fn() -> u8x32 has SSE ABI incompatible with AVX ABI in &Fn() -> u8x32
    call_via_generic(&|| a); // ERROR:
    // ^^ trait bound not satisfied: u8x32 has SSE ABI which is incompatible with the AVX ABI in where F: Fn() -> u8x32
}

because the SSE abi is incompatible with the AVX abi for u8x32. The alternative is that bar wraps the closure in a shim with the proper ABI that does the abi conversion dance before calling the closure that the user wrote.

@arielb1 Nevermind, I think I see the issue you are pointing to now. The closure trait is declared somewhere else (trait Fn() -> u8x32 { ... }) so it will have the ABI of wherever it is declared? If so, the example is only unsound if the trait wasn't compiled with the AVX abi. This would apply to any trait using portable vector types on their signature

trait Foo {
  fn foo() -> u8x32;
}

An option would be to make repr(simd) types not be object safe. Another option would be make them parametric on the ABI, so that the trait above (and the closure traits) become:

trait<ABI> Foo {
  fn foo() -> u8x32<ABI>;
}

Doing this would solve your example without shims. Otherwise, I think it is a bit weird that the vtable case requires shims and the generic case does not. This also allow users to manually insert the shims when they get a type error, making the costs explicit.

@gnzlbg

How do you propose to discover these errors at run-time?

Oh I just mean "segfault" or "wrong results" mean you did someting run, not actually instrumenting the program.

I think we should prevent linking together crates created with different ABIs.

Unfortunately we'll need to get this working somehow, or at least some subset. We won't be compiling the standard library with +avx2, for example, but you may be compiling downstream code with that. Ideally we'd recompile the whole standard library but we're not really at that point just yet...

Historically I've concluded that this just means that all SIMD-related functions in libstd are #[inline], and that should solve the problem AFAIK for SIMD.

The consensus seems to be that lifting the monomorphization-time errors to type-checking would be the ideal solution

I personally don't quite agree with this, my ideal solution would be to either (a) declare it unsafe and not do anything other than the lint that @rkruppe mentioned or (b) insert compiler-generated shims to get code working.

I'm a fan of (a) because I haven't seen a compelling argument yet for why these patterns will arise a lot in practice. I'm also a fan of (b) because it seems much simpler to implement (although still not easy) and while it affects performance it only affects it in (AFAIK) "weird" situations, again b/c I haven't seen a compelling argument for why this is common.

I suppose that in the world of "what I wish for comes about without thinking about any consequences" I'd probably choose the error route because this is indeed perf-sensitive code and you want to be alerted so situations where you made a mistake rather than either segfaulting or silently getting much slower. That being said, there are consequences to having a typeck error here, and I'm not sure it's worth it (adding an ABI parameter to all traits? to all functions?)

With the exception of the multicrate scenareo (which includes cases with floats), can this arise in safe code?

If we go @alexcrichton (a) route we can close this as "work as intended", and fill an issue to try to warn on this as @rkruppe proposed.

I don't really understand why we don't just use the existing language feature for differing ABIs to solve this, namely extern "<ABI>". The problem of target-features adding new registers has been solved like this before.

Take, for example, the Arm soft float targets, e.g., arm-unknown-linux-gnueabi. These don't have floating point registers available by default, so float arguments are passed in general purpose registers. Even if you enable any of the floating point unit versions, vfp2, vfp3, vfp4, etc., with target-feature, float arguments are still passed in GPRs. You have to use extern "aapcs-vfp"1๏ธโƒฃ if you want floats to be passed in floating point registers.

Why not do the same for x86_64, with separate explicit ABIs for AVX and AVX-512?:two:


1๏ธโƒฃ Rust doesn't currently expose this because no one has asked for it, it would be trivial to add because it's a calling convention that LLVM supports just like extern "aapcs", which rust does expose. No one has asked for it probably because any performance critical code where it mattered would be marked #[inline] anyway. I think that would likely be the case for x86_64 too.

2๏ธโƒฃ I propose passing 256+ bit wide vectors on the stack for the default rust ABI as that's the simplest to implement in the compiler and is what Clang does when AVX features aren't enabled. They could also be split into smaller registers and passed that way, but as the rust ABI is unspecified it can be changed at a later date anyway to whatever is deemed best.

Putting the target features into the function ABI is an interesting approach. I had reasons why it may not work very well, but upon further thought I realize that most of these problems are problems inherent to correctly representing the impact on the ABI in the type system. On the contrary, using the ABI string avoids some issues of the "f32x4<ABI> approach".

I'm still not a fan of a type system solution, but if anyone wants to keep arguing for that, I would strongly suggest putting it in the ABI string as @parched suggested.

Nominating for prioritization.

triage: P-medium -- no urgent action required

But this may want to block target-feature stabilization (I left a comment there).

@parched Can you show a concrete example of how that would work ? This is what I understand but I am not sure I got everything:

// Crate compiled for x86, no -C target-feature passed, also SSE ABI
#![feature(repr_simd)]
#![feature(target_feature)]
#![allow(non_camel_case_types)]

// Portable vector type, can be used for all ABIs, but UBs on incompatible ABIs:
#[derive(Debug)] #[repr(simd)]
struct f32x8(f32, f32, f32, f32, f32, f32, f32, f32);

extern "SIMD_AVX" {
  // the return type will be a 256bit vector
  #[target_feature = "+avx"]
  unsafe fn foo() -> f32x8 { f32x8(0.,1.,2.,3.,4.,5.,6.,7.) } 
}

// the return type will be 2x128bit vectors
#[target_feature = "+avx"]
unsafe fn bar() -> f32x8 { 
  // this function uses AVX but must follow the crate ABI (SSE in this case),
  // that is, it can use AVX operations,
  // but the args/result must be converted from/to its 
  // ABI to the AVX ABI to do so
  f32x8(0.,1.,2.,3.,4.,5.,6.,7.) 
} 

#[target_feature = "+sse3"]
unsafe fn baz(arg: f32x8) {  // f32x8 will be 2x128bit vectors
    println!("{:?} ?= f32x8(0, 1, 2, 3, 4, 5, 6, 7)", arg);
}

fn main() { 
  unsafe {  // all the functions are unsafe per the RFC
      // foo has the SIMD_AVX abi, but baz the SIMD_SSE ABI:
      baz(foo()); // ???: incompatible ABI error? or run-time UB? or shims?
      // both bar and baz have the SIMD_SSE ABI:
      baz(bar()); // OK: ABIs match

      // u must have the crate abi (SIMD_SSE), but foo returns with the SIMD_AVX abi:
      let u = foo();  // ???: incompatible ABI error? or run-time UB? or shims?
      // here I am doing a dance between function calls to convert foo's f32x8 result with
      // the SIMD_AVX abi to a f32x8 result with the SIMD_SSE ABI (see below):
      let v = sse_from_avx!(foo()); // OK ???
   }
}

extern "SIMD_AVX" {
  #[target_feature = "+avx"]
  unsafe fn split_avx(f32x8) -> (f32x4, f32x4);
}

extern "SIMD_SSE" {
  #[target_feature = "+sse"]
  unsafe fn merge_sse((f32x4, f32x4)) -> f32x8;
}

macro_rules! sse_from_avx { 
    ($x:expr) => (  merge_sse(split_avx($x))  )
}

I haven't thought about function pointers or trait objects with this approach yet, but @parched maybe could you clarify if this is what you mean, and what you expect in this example to happen, or provide a modified example?

@parched So IIUC the following would be impossible with what you are proposing, right?

#[target_feature = "sse"]
fn foo(x: f32x8) -> f32x8 {
  if host_supports("avx2") {
    fast_avx2_algo(x)  // ERROR: can't work around this
  } else {
    slow_algo(x)
  }
}

E.g., fast_avx_algo needs to be like bar not foo.

I don't think the ABI string approach needs to be tied to any particular strategy for resolving the ABI mismatches. You could use the ABI information to prohibit all calls with ABI mismatches, but you could also use it to generate shims. You could even leave everything unsafe and in the hand of the programmers, but use the ABI strings as something of a lint for function pointers (this is likely not worthwhile, but it's conceivable).

@parched so if I have an SSE f32x8, how would I pass it to a function that has the AVX ABI and takes as argument an f32x8 ? Is there a way to do this under your proposal?

E.g. I could manually split the SSE f32x8 into two SSE's f32x4, and then passing it to an AVX function that takes these two and merges them into a single AVX f32x8. However, if I cannot call a function with an AVX ABI from another one that does not have this ABI, then there is no way I can pass this AVX f32x8 to my original function.

so if I have an SSE f32x8, how would I pass it to a function that has the AVX ABI and takes as argument an f32x8? Is there a way to do this under your proposal?

I'm not sure what you mean? There isn't really such thing as an "SSE f32x8" or an "AVX f32x8" there's just f32x8 that the compiler can store wherever it chooses, e.g., in memory, in SSE registers or in AVX registers if available. It only has to store them in specific place when doing a function call, and that is specified by the ABI.

There isn't really such thing as an "SSE f32x8" or an "AVX f32x8"

What I mean is that the layout is different. An f32x8 on an SSE ABI has a different layout than an f32x8 on an AVX ABI. This matters because I need to know the layout of these types to be able to call the appropriate intrinsics on them:

#[target_feature = "+sse4.1"]
fn foo() -> f32x8;  // opaque

#[target_feature = "+avx"]
fn baz() -> f32x8 {
  let y: f32x8 = f32x8::zeros();
  // calling an SSE function from AVX (incompatible ABIS):
  let x: f32x8 = foo(); 
  // foo returns in 2x128bit registers
  stdsimd::_mm256_full_hadd_ps(x, y) // is this ok?
  //^^ this function expects the arguments to be passed in
  // 256bit registers
}

I need to know the layout of x in the function above. You said before:

You wouldn't be allowed to call foo inside main because main
doesn't have #[target_feature("+avx")]

and also

No that would be fine, provided fast_avx_algo has the default ABI, not the
avx one.

What do these comments mean? Does your approach allow calling function across mismatching ABIs? (this is a yes/no question, followed by a "how is this done" if the answer is yes).

@rkruppe says:

You could use the ABI information to prohibit all calls with ABI mismatches, but you could also use it to generate shims.

Sure. But if we error, we need a way to fix the error, and if we automatically generate shims everywhere, we might want a way to detect where this happens. I am leaning towards a "let's just automatically generate shims everywhere and warn when this happens".

Note that "disallowing function calls with mismatching ABIs" is not a thing in current Rust. You can always call an "abi1" function from an "abi2" function. A function with ABI "abi1" calling a function with ABI "abi2" currently always legal. If anyone wants to use ABI strings but prohibit calls with ABI mismatches, that would be a new feature entirely separate from ABI strings.

stdsimd::_mm256_full_hadd_ps(x, y) // is this ok?
//^^ this function expects the arguments to be passed in
// 256bit registers

Yes that looks fine, but what exactly do you mean by "expects the arguments to be passed in 256bit registers"? How is it defined?

If by mismatching ABIs you mean calling a function with one ABI from another function with a different ABI then yes, that would be allowed like you can call extern "C" from a normal rust function. The only thing that would be disallowed is calling a function with an ABI that requires some registers (e.g., AVX) from a function that doesn't have them enabled.

@parched

It is defined as:

#[target_feature = "+avx"]
fn __mm256_full_hadd_ps(b: f32x8, b: f32x8) -> f32x8;

If by mismatching ABIs you mean calling a function with one ABI from another function with a different ABI then yes, that would be allowed like you can call extern "C" from a normal rust function. The only thing that would be disallowed is calling a function with an ABI that requires some registers (e.g., AVX) from a function that doesn't have them enabled.

Since the function from which it is called (baz) has the registers enabled (it is annotated with avx), then IIUC in your proposal this will be allowed. Then my next question is, will this result in undefined behavior?


@rkruppe

Note that "disallowing function calls with mismatching ABIs" is not a thing in current Rust. You can always call an "abi1" function from an "abi2" function. A function with ABI "abi1" calling a function with ABI "abi2" currently always legal. If anyone wants to use ABI strings but prohibit calls with ABI mismatches, that would be a new feature entirely separate from ABI strings.

Sure, this is the status quo, and it results in undefined behavior at run-time. In the end, this might be what we actually want, but IIUC this behavior has been "accidental" as opposed to "deliberate".

@gnzlbg I mean how is it actually defined, e.g, written in rust, assembly or and llvm builtin? As you have declared it there with the default ABI it wouldn't receive the arguments in 256bit registers, for that it would needs 'extern "avx"'

I mean how is it actually defined, e.g, written in rust, assembly or and llvm builtin?

Its written in Rust, and calls one LLVM intrinsic. Hypothetically, it would need to be defined extern "avx" since the LLVM intrinsic it calls will need extern "avx" as well.

Rebranding the previous example to use extern "ABI":

extern "SSE-SIMD-ABI" {
    #[target_feature = "+sse4.1"]
    fn foo() -> f32x8;  // opaque
}

extern "AVX-SIMD-ABI" {
    #[target_feature = "+avx"] 
    fn _mm256_full_hadd_ps(f32x8, f32x8) -> f32x8;  // opaque
}

extern "AVX-SIMD-ABI" {
    #[target_feature = "+avx"]
    fn baz() -> f32x8 {
      let y: f32x8 = f32x8::zeros();
      // calling an SSE function with SSE ABI from an AVX function with AVX ABI
      let x: f32x8 = foo(); 
      // foo returns in 2x128bit registers
      _mm256_full_hadd_ps(x, y) // is this ok?
      //^^ this function expects the arguments to be passed in
      // 256bit registers
  }
}

Can I call foo here? And if I can call it, can I call the _mm256_full_hadd_ps function afterwards ?

@gnzlbg I'm talking purely about the mechanism of ABI strings, and that is certainly not an accident. All the existing "ABI string mismatches", such as calls from a "Rust" ABI function into a "C" ABI function, Just Work. If other things that affect "the ABI", such as target features, are also included in the ABI strings, then we may have to re-evaluate that, but that would be a new feature, not a natural consequence of having called between functions with different ABI strings. For that matter, I would argue that the more natural choice would be to generate shims rather than generating errors, because it's consistent with how ABI string mismatches are not an issue currently. The accident, if any, is that LLVM can't handle (direct) calls between functions with different target features naturally!

I am pointing this out not to argue for or against such a feature, but to stress that the solution @parched is advocating (or at least, the parts that y'all are discussing in depth) is entirely orthogonal to ABI strings.


That said, I currently lean towards putting the effect target_feature has on the ABI into the ABI string [1], and generating shims both for calls and when reifying function pointers by looking at the ABI strings -- which plays nice with the plans to allow coercions to function pointers of different ABIs (e.g., getting a extern "C" fn() from a function declared with "Rust" ABI) via shims.

Compared to the "always generate shim for the default ABI when reifying a function pointer" solution, this is somewhat more flexible: It allows handling pointers to functions that have certain target features enabled, rather than forcing all indirect calls to go through shims. This doesn't work for Fn* generics and trait objects (can trait methods even have non-"Rust" ABIs?), but it's something.

By the way, because the impact of target_feature is orthogonal to other aspects of the ABI, the ABI strings should not be "AVX-SIMD-ABI", "SSE-SIMD-ABI" etc. but (straw man syntax) "Rust,+avx", "C,+avx", "Rust,+sse", "C,+sse" etc.


[1] It need not affect the ABI string if the function signature isn't affected by the target feature in question.

Can I call foo here? And if I can call it, can I call the _mm256_full_hadd_ps function afterwards?

Yes and yes.

@parched

Yes and yes.

Ok, so how does that work? Because foo returns the f32x8 in two 128bit registers and _mm256_full_hadd_ps expects a single 256bit register.

Does the compiler automatically perform this conversion? (e.g. by automatically inserting shims?)


@rkruppe We started from the assumption that we could detect ABI mismatches somehow, and ABI strings is one of the ways to do it. Its already there, and if it works, then great. The other parts of the discussion are of whether we want to detect this at all (or just let it be undefined behavior), and if we want to do something about it, then what (automatic shims, error, manual shims, etc.).

I am pointing this out not to argue for or against such a feature, but to stress that the solution @parched is advocating (or at least, the parts that y'all are discussing in depth) is entirely orthogonal to ABI strings.

I still don't know what @parched is proposing. He proposed ABI strings to detect incompatible ABIs, and to be able to manually choose an ABI. I think that's a great solution. Simultaneously, he proposed to sometimes make it an error ("you can't call this or that function"), but from his last comment, it seems that he is proposing to automatically add shims, so I am still a bit confused. I am a bit slow but I'll get there.

@gnzlbg For that case, baz would receive two <4 x float> from foo and join them into one <8 x float> which can be considered a shim of sorts. The only thing I am saying would be disallowed is calling an extern "avx" from a function not explicitly marked #[target_feature("+avx")] (or some superset). That's because the caller has to be able to use avx registers to call an extern "avx" function.

The only thing I am saying would be disallowed is calling an extern "avx" from a function not explicitly marked #[target_feature("+avx")] (or some superset). That's because the caller has to be able to use avx registers to call an extern "avx" function.

I think that's all right. If a user needs to do this they can work around this by using a function that glues both abis.

FYI the shims are required to make Into::into() work properly:

#[target_feature = "avx2"] {
  let x = f64x4::span(...);
  let y : i64x4 = x.into();  // shims required
}

Here, since Into::into is not an avx2 function, we need to insert shims to convert from the AVX2 ABI to the SSE2 ABI, execute into which just calls From::from, and then convert back from the SSE2 ABI to the AVX2 ABI.

SIMD code tends to use .into() a lot, so the answer to the question: "how often do we need shims in practice" is probably "all the time".

Won't into and from get inlined basically always though? (Also not clear if the significant aspect of shims is execution time, compile time, binary size, ...?)

@glaebhoerl Into::into is #[inline] and (at least without shims) very short, so the intent is that with optimizations enabled it should be inlined when profitable.

Whether From::from is inlined depends on the particular from implementation. The ones in stdsimd are #[inline(always)] but #[inline] would probably be enough once we have shims.

Also not clear if the significant aspect of shims is execution time, compile time, binary size, ...?

Adding shims adds more code which increases compile-time, the question is by how much?

Most applications I know of have only a tiny fraction of explicit SIMD code, so this might not be even measurable. Also, all applications working properly today won't probably need shims (otherwise they wouldn't be working correctly), so I wouldn't worry about this initially beyond checking that compile-times don't explode for the crates currently using SIMD.

Whether execution times will be affected is very hard to tell. Applications have little explicit SIMD code, but that code is typically in a hot spot.

As a thought experiment, we can consider the worst case: Into::into, which just calls From::from. If from is just a mem::transmute, then it costs 0 cycles. But we need to shim the argument to from and its result, so that goes from 0 to at least 2 cycles. That's an infinite blow up :D So here, if Into::into isn't inlined, or if it is but the optimizer does not remove the shims, then this would be a performance bug that would need to be fixed.

Beyond the worst case things get better though: if an application is just executing a single SIMD instruction in their hot loop, e.g., taking two arguments, and the shims aren't removed then they might go from 1 cycle to 4 cycles. And if they are doing something more complex then the cost of the shims quickly becomes irrelevant.

In all of these cases:

  • the code would be broken without the shims, so the users could not write it in the first place, and
  • if the shims turn out to be a performance problem in some applications users can change their code such that the shims are not generated (e.g. calling From::from instead of Into::into, refactoring their hot loop/functions to not use vector types with incompatible ABIs in their APIs, etc.).

If binary size/execution speed/compile time turns out to be a problem for debug builds I'd say let's worry about that when we get there.

I've been thinking about this again recently with an eye towards hoping to push SIMD over the finish line towards stabilization. Historically I've been a proponent of adding "shims" to solve this problem at compile time. These shims would cause any ABI mismatch to get resolved by transferring arguments through memory instead of registers.

As I think more and more about the shims, however, I'm coming round to the conclusion that they're overly difficult (if and not sure if possible) to implement. Especially when dealing with function pointers is where I feel like things get super tricky to do. Along those lines I've been reconsidering another implementation strategy, which is to always pass arguments via memory instead of by value.

In other words, let's say you write:

fn foo(a: u8x32) { ... }

Today we'd generated something along the lines of (LLVM-wise)

define @foo(<i8 x 32>) {
  ...
}

whereas instead what I think we should generate is:

define @foo(<i8 x 32>*) { ; note the *
  ...
}

Or in other words, SIMD values are unconditionally passed through memory between all functions. This would, I think, be much easier to implement and also jive much more nicely with the implementation of everything else in rustc today. I've historically been opposed to this approach thinking that it would be bad for performance, but when thinking about it I actually don't think there's going to be that much impact.

In general I'm under the impression that SIMD code is primarily about optimizing hot loops, and in these sorts of situations if you have a literal function call that's already killing performance anyway. In that sense we're already inlining everything enough to remove the layer of indirection by storing values on the stack. If that's true, I actually don't think that if we leave a call around that happens to take arguments through memory that it'd actually be that much of a performance loss!

AFAIK the main trickiness around this would be that Rust functions would pass all the vector types via memory, but we'd need a way to pass them by value to variuos intrinsic functions in LLVM.

In general though, what do others think about an always-memory approach?

eddyb commented

AFAIK the main trickiness around this would be that Rust functions would pass all the vector types via memory, but we'd need a way to pass them by value to variuos intrinsic functions in LLVM.

The intrinsics don't have the "Rust" ABI and we'd still be passing vector types by value to C, right?

I think this approach is the easiest out of all possible ones, since all you need to change is:

match arg.layout.abi {
layout::Abi::Aggregate { .. } => {}
_ => return
}

There's two ways to do it:

  • add | layout::Abi::Vector { .. } after layout::Abi::Aggregate { .. }
    • this will cast into an integer if the size of a pointer or smaller
  • add a layout::Abi::Vector { .. } => { arg.make_indirect(); return; } arm

Thanks for the tip @eddyb, that was perfect! I've opened #47743 to close this issue.