rust-lang/rust

Tracking issue for RFC 2045: improving `#[target_feature]`

Opened this issue ยท 46 comments

This is a tracking issue for the #[target_feature] segment of RFC 2045 (rust-lang/rfcs#2045).
#[cfg(target_feature)] was tracked in #29717 and has since been stabilized.

Steps

  • Implement the proposed #[target_feature] semantics
  • Document these semantics: rust-lang/reference#545
  • Stabilize
    • x86_64 & i686
      • The basic set (sseโ€“sse4.2, avx, avx2, ...) (#49664)
      • adx: #93745
      • avx512
      • mmx (never going to be stabilized)
      • sse4a
      • tbm
      • lahfsahf
      • prfchw
    • arm
    • aarch64
    • hexagon
    • powerpc
    • mips

@gnzlbg have anything else you want filled out here?

(below added from comments on PR)

  • consensus on the API for run-time feature detection
  • should cfg!(feature) work across #[inline(always)] functions, generics, etc?

And some related tasks:

  • API breaking change: allow #[target_feature] on unsafe functions only
  • API breaking change: #[target_feature = "+feature"] => #[target_feature(enable = "feature")]
  • fix bug: #42515
  • resolve bug: #44367

@alexchrichton

Yes, we should also clear some of the unresolved questions:

  • consensus on the API for run-time feature detection
  • should cfg!(feature) work across #[inline(always)] functions, generics, etc?

And some related tasks:

  • API breaking change: allow #[target_feature] on unsafe functions only
  • API breaking change: #[target_feature = "+feature"] => #[target_feature(enable = "feature")]
  • fix bug: #42515
  • resolve bug: #44367

It would be nice if those API breaking changes could be prioritized.

Rust already will sometimes not inline a #[inline(always)] function (notably recursive situations), so unsafe code already can't assume that #[inline(always)] code will always be inlined.

@retep998 cfg!(feature) does not require any unsafe code.

@gnzlbg I meant in regards to where the RFC states that #[target_feature] and #[inline(always)] mixed together should result in an error. We can simply not inline functions in that case, even if they are marked as #[inline(always)], because Rust will already sometimes not inline functions marked #[inline(always)].

@retep998 We could definitely relax this by just not inlining an #[inline(always)] function. I don't know if we want to do so: the user did not write inline, but inline(always), we should at least warn here.

What we cannot currently do is inlining a function across mismatching features (independently of its inlining annotations). The alternative sections of the RFC explores this a bit though.

Any news on implementation status of --enable-features="feature0,feature1,..." proposed in the RFC?

@newpavlov this is being discussed in issue 49658 . (note from pnkfelix: did this mean to say #49653 ?)

#48556 has entered FCP which I believe will implicitly stabilize this attribute

So after the inline-always fix it looks like the only bug open still being tracked here is #42515 .

That bug looks hard to solve and probably won't make it before stabilization.

The following issues look "easy" to fix low-hanging fruit. It might be worth to fix these before stabilization:

I'm reopening this as tracking issue for the non-x86 #[target_feature]s.

Since the word "feature" is quite overloaded I'll try to use the words "Rust feature" and "target feature" to disambiguate.

I think there are some big issues with the x86 Rust feature gating as implemented right now. The following unstable Rust features were added after the main x86 target feature stuff was stabilized:

  • avx512_target_feature
  • mmx_target_feature
  • sse4a_target_feature
  • tbm_target_feature
  • adx_target_feature
  • cmpxchg16b_target_feature
  • movbe_target_feature

However, the Rust feature gate only affects the target_feature(enable) and cfg(target_feature) attributes. It does not cover the matchers of the is_x86_feature_detected macro, and it does not (always) cover the corresponding intrinsics either. For example, this compiles on stable:

use std::arch::x86_64::*;

fn main() {
    if (is_x86_feature_detected!("sse4a")) {
        println!("{:?}", unsafe { _mm_extract_si64(_mm_setzero_si128(), _mm_setzero_si128()) });
    }
}

Here's what RFC 2325 says about this:

It's not expected that the contents of std::arch will remain static for all time. ... Rather once [new intrinsics are] implemented they'll be stabilized and included in libstd following the Rust release model.

It doesn't say so explicitly, but I assume the stdsimd Rust features used to "follow the Rust release model" should match the Rust features for the target features used. Or at the very least, these Rust features should be stabilized at the same time.

... is_target_feature_detected!. The macro will accept one argument, a string literal. The string can be any feature passed to #[target_feature(enable = ...)] for the platform you're compiling for.

It's not stated explicitly, but I assume this is meant to only enable those strings that you can pass to target_feature(enable) with your current compiler.

How can we clean up the current mess?

It does not cover the matchers of the is_x86_feature_detected macro,

I think we could fix this in the is_x86_feature_detected macro. I don't think we can just cfg out the match arms but we could simply cfg in some compile_error! if these features are not enabled when the match arms are used. There might be a better way though.

and it does not (always) cover the corresponding intrinsics either.

That's because those intrinsics are stabilized:
https://github.com/rust-lang-nursery/stdsimd/blob/master/crates/core_arch/src/x86/sse4a.rs#L56

I don't know if this stabilization was accidental or not (It looks to me to be accidental since those feature gates were created to not stabilize these features).

How can we clean up the current mess?

For the two points mentioned above, PRs to stdsimd should suffice.

For the issue about #[target_feature(enable = "...")] accepting unstable features, we will need some changes to rust-lang/rust.

For the two points mentioned above, PRs to stdsimd should suffice.

I didn't do a full audit of all target features/intrinsics that should or shouldn't be Rust feature gated.

Basically, all the features here

// unstable `#[target_feature]` directives
should be unstable. That is, using these in #[target_feature(enable = "...")], #[cfg(target_feature = "...")], and using the core::arch:: intrinsics that correspond to them should be unstable and require enabling the appropriate feature.

Stabilizing any of these would require at least a mini-FCP.

That's the easy part of the audit. We know that part works right.

That's the easy part of the audit. We know that part works right.

I know it worked (so did everything back then AFAICT), but I don't know if everything is still working right now (as you point out, some things aren't working).

I am going to add compile fail tests to stdsimd to make sure that things don't break again.

This feature seems too inconvenient to use right now. Having to mark everything unsafe is not great, but marking individual functions is also not very useful when writing generic code. Consider the following code, in which I have a larger algorithm that ties together two computational kernels:

trait Kernel1 {
    ...
}

trait Kernel2 {
    ...
}

fn compute<K1: Kernel1, K2: Kernel2>(...) {
    ...
}

I have several hand-optimized versions of either kernel using SIMD intrinsics for various feature sets. I have a wrapper function for compute that uses feature detection to figure out which implementation of the algorithm to call. This doesn't work very well however. I'm seeing a 35% reduction in performance when doing this, compared to a compiling a particular implementation setting the features via RUSTFLAGS. When monomorphizing compute, what I want is that the compiler computes the union of the features needed for K1 and K2 and uses that in the codegen for compute and all functions called from compute.

Thanks @gnzlbg! Do you have a link to those RFCs? Also in my case I'm only using associated functions, not methods.

@jethrogb there is a PR open that fixes the accidental stabilization: #64534 .

@jonas-schievink this should probably be marked with A-stability as well, since an accidental stabilization was reported here.

With the impending release of Apple Silicon macs, it could be useful to start stabilizing the basic aarch64 target_feature support (in particular, I care about neon). This'll help crates provide high-quality simd implementations on this platform. Anything I can do to help in this effort?

@joshuawarner32 You can help by adding more NEON intrinsics to https://github.com/rust-lang/stdarch if you want extensive aarch64 NEON support. It does not cover that many intrinsics, and we'd want more testing of more NEON intrinsics.

Are there more boxes that should be checked for this tracking issue? For example, I'm able to use the arm target features on nightly without a feature.

ehuss commented

@bradjc Which ARM features are you able to use? They should all be unstable.

I was building with --target=thumbv7em-none-eabihf -Ctarget-feature=+armv7e-m,+m7,+v7clrex,+fp-armv8d16,+hwdiv,+fp64,+fpregs,+fpregs64,+noarm,+db,+thumb2 and then checking literally every arm target-feature using cfg!() and #[cfg()] without enabling arm_target_feature.

ehuss commented

@mattico This issue is about stabilizing the #[target_feature] attribute. The -C flag has been stable since 1.0.

@ehuss Ah, yes. Was confused because the same whitelist is used for both purposes.

They have similar nomenclature and use cases, but #[cfg(target_feature)] is not #[target_feature].

Perhaps the issue description should be updated to note that this only tracks a portion of rust-lang/rfcs#2045 ? Because that RFC seems to specify cfg_target_feature in addition to target_feature, and this issue description does not make it clear that cfg_target_feature was tracked elsewhere and has since been stabilized (#29717).

From looking at the checked boxes in this issue description, I assumed that nothing in RFC 2045 had been stabilized for non-x86 architectures.

Am I right in saying that cfg!(target_feature = "...") doesn't currently match the description from the RFC? I.e. it doesn't depend on the current context, only on the build arguments. This means that it cannot be used to selectively enable code in a function based on the context that the function is inlined into.

According to the RFC, this code should panic, but currently it does not:

playground link

fn main() {
    if is_x86_feature_detected!("avx2")
    {
        println!("Has AVX2");
        unsafe { foo() };
        println!("Didn't panic!");
    }
}

#[target_feature(enable = "avx2")]
unsafe fn foo() {
    if cfg!(target_feature = "avx2") // this should always be true
    {
        panic!();
    }
}
ehuss commented

@dylanede Correct, cfg_target_feature is not context-aware as described in the RFC. I believe #42515 is tracking that, and it was intentionally stabilized without that being implemented (though there wasn't mention of the cfg! macro specifically).

FYI, opened a PR where I tried to pick up @gnzlbg's work from #60109 stabilizing the ADX target feature:

#93745

The lang-team looked at this issue during its backlog bonanza on 2022-06-09

There are a couple of problems here.

First, from my own attempts to quickly review the status here, It is not immediately clear which items that are listed on this issue's description represent cross-cutting concerns for all (or nearly all) uses of #[cfg(target_feature="...")] and/or #[target_feature="...")], and which are "just" to-do items of potential CPU features to support.

One thing that did become clear during my review is that the checklist in the description is not really being maintained. (E.g. an early checkbox item, "add documentation", was left unchecked; I determined that documentation had been added and checked it off myself. As another example: there were suggested checkbox items that seemed like they should have been transcribed to the description, but instead were left on the comment thread.)

A big checklist that is not been maintained (i.e. the open items may or may not be actually done) is not great.

Furthermore: The T-lang team thinks its is not great to block resolving this issue on every potential CPU feature that we could support.

At the time of that backlog bonanza meeting:

  • the T-lang understanding was that the main question remaining here was whether the CLI flag --enable-feature, documented in RFC 2045, was still desireable. (Note the RFC discusses how this flag differs from the already present (and defacto stable) -C target-feature=... CLI flag.
  • the thinking was that we could/should fcp close this issue, noting that any new features slated for stabilization should be in new issues.

However, since reviewing the comment thread on this issue, I'm not convinced that its ready for an fcp close. Or at least, I was not able to determine the current status on a number of (unlinked, potentially unfiled?) issues that are listed for it, and it would take a bit more effort for me to build experimental code to establish the status on those issues.

Perhaps most worrisome here: I am not sure that the owners who were most invested in driving this feature forward are currently in a state where they are going to continue doing so.

In other words: I think this issue is in need of an owner.

(I would be happy to be proven wrong here. If it is true that there is no longer interest in making a --target-feature flag, at least in the short term, and that there are no longer cross-cutting issues, then perhaps the only remaining work is truly the checklist of target-specific features. if that is true, then I would suggest that we fcp close this issue, and open up specific issues for each such target-specific feature. Its too much work to ask a single owner to drive progress on each such thing.)

Hello! Please add stabilization state of RISC-V architecture in issue details as well :) It only has x86_64, arm etc.

We discussed this in a @rust-lang/lang backlog review today.

We feel like this tracking issue tracks two different things:

  • Target-specific features, of which there's a never-ending stream as CPUs add new features. This isn't something we should have an overall tracking issue for, just tracking issues for adding and stabilizing new features.
  • Enhancing target_feature itself (attributes, command-line options, etc) to make it easier to use, safer to use, usable in more places, etc. This is something we'd love to see, but it needs an owner.

API breaking change: allow #[target_feature] on unsafe functions only
API breaking change: #[target_feature = "+feature"] => #[target_feature(enable = "feature")]

Both of these are true today, except for the first on WASM platforms, where #[target_feature] was made applicable to safe functions knowingly.

Any update on AVX 512 support?

Hi, is there anything blocking stabilising arm_target_feature?

Hey! I am currently implementing RISC-V Zk intrinsics, and I am running into a pattern that is currently not well covered by this PR.

Either Zkne and Zknd extension needs to be enabled to get access to an intrinsic. I imagine for the compiler it is enough to write:

#[target_feature(enable = "zkne")]
pub unsafe fn aes64ks1i(...) -> ... {
     ...
}

But this produces wrong cargo doc output and can be misleading. Namely, it states that zkne needs to be enabled.

#[target_feature(enable = "zkne", enable = "zknd")]
// OR
#[target_feature(enable = "zkne")]
#[target_feature(enable = "zknd")]
pub unsafe fn aes64ks1i(...) -> ... {
     ...
}

Produces zkne AND zknd.

#[target_feature(enable = "zkne,zknd")]
pub unsafe fn aes64ks1i(...) -> ... {
     ...
}

Produces zkne,zknd which does not exist.

I feel like this is only a documentation issue, but still an issue.

majaha commented

I'm not sure if this is the right place for this, but the wasm feature #[target_feature(enable = "multivalue")] is broken at the moment, as it leaks into other untagged functions: Godbolt
This is essentially for the same reasons I've outline here: #83788 (comment)
There may be other target features with similar non-local behaviour, I haven't tested them all.

I tried reading through the open issues regarding target features, but could not find a more fitting one for this question. I have the following code, which currently returns different values on stable vs nightly when compiled with -Ctarget-cpu=skx (Godbolt). I would have expected that target-cpu implicitly enables that feature, or a warning/error that the avx512 target features are still unstable.

pub fn preferred_vec_size() -> usize {
    if cfg!(all(target_arch = "x86_64", target_feature = "avx512f")) {
        64
    } else if cfg!(all(target_arch = "x86_64", target_feature="avx")) {
        32
    } else {
        16
    }
}

That's... huh. I guess the first branch is being ignored as an invalid feature gate on stable.

Are there any particular stabilization blockers for feature(avx512_target_feature), or does it just need a stabilization PR?

Edit: response here