rust-lang/wg-cargo-std-aware

What will the UX be for enabling "build libstd" logic?

alexcrichton opened this issue · 14 comments

This is intended to be a tracking issue for the eventual state of how one actually enables Cargo building libstd. Currently this is done via the -Z build-std flag. The -Z build-std flag takes an optional list of parameters, but this optional list will ideally no longer be necessary once #5 is implemented.

What should the eventual syntax for -Z build-std end up being in that case? (assuming the value of the flag is removed). A strawman proposal might be to add a configuration value build.std (defaults to false) to the configuration. That way CARGO_BUILD_STD=true would enable it for a build, you could also use .cargo/config, and with rust-lang/cargo#6699 could even be --config build.std=true as a CLI flag.

One question would be if we can infer this configuration value in some situations. For example if libstd is missing for the target you're building for, should we default to building libstd? In any case, things to think about!

The feature gate to opt into this feature while it is unstable should be a -Z flag and/or an entry in cargo-features, per https://doc.rust-lang.org/cargo/reference/unstable.html

As to the eventual behavior after the feature is stabilized, I feel that should not be a flag at all for this. Rather, Cargo should try to compile the standard library automatically if a binary is not already available for the requested (target, profile configuration).

Oh sorry right good point, should probalby discuss that possibility!

I'm personally worried about automatically inferring the profile of libstd and switching to -Z build-std by default for many builds. For example libstd never matches a cargo build profile by default because libstd is optimized and without debug assertions. Similarly libstd also doesn't match the default cargo build --release profile because it has debuginfo. In that sense I think that the profile configuration mismatches too regularly to have "false positives" for changing a profile to indicate that the user wants to recompile the standard library.

Even for missing targets though, that one's also pretty tricky. It's difficult to distinguish between the case of "I forgot to rustup target add" vs "this is a brand new target and I really want to recompile std". For example not all targets are likely to work in the long run with -Z build-std like MinGW ones, or similarly we're able to build more "featureful versions" on CI sometimes which include things like fuzzers, C code in compiler-builtins, Linux binary compatibility, etc.

I'd love to eventually get to a world where we could automatically infer this, but I fear that it's pretty far away that we could robustly automatically infer whether -Z build-std was desired or not, so that's sort of why I'm thinking that an opt-in will be required.

For example libstd never matches a cargo build profile by default

Right, I also think that we should not change the semantics of existing manifests that run on the stable channel. This implies that [profile] should not affect the standard library by default (like before), unless some opt-in key is present.

"I forgot to rustup target add"

That’s a good point. When a std binary can be obtained either by spending bandwidth (rustup target add) or CPU time (recompile), there’s not an obvious answer as to which is prefereable.

not all targets are likely to work in the long run with -Z build-std like MinGW ones

In theory shouldn’t they work, if for example MinGW (or whatever other requirement) is available?

Ah yeah it's also worth pointing out that my proposal at the top only included .cargo/config, but having one in Cargo.toml also seems reasonable! (via profiles and such)

MinGW/wasi/etc are unfortunately pretty weird in how they're integrated with the compiler, but it's probably best to keep discussion for that on #30.

There are cases where it might make sense to re-build libstd implicitly.

For example, if the target-feature or target-cpu of the target are overridden, libstd should be always re-built using those to avoid soundness bugs, or cargo might want to preemptively error if this cannot be done.

Ideally, at some point, rustc would be able to error when these issues happen, but that might disrupt the workflows of people using -C target-cpu=native, so it would be IMO better if these workflows would be transparently migrated to just rebuild libstd by default.

I wasn’t aware of such soundness issue, could you say more? It sounds worrying, since -C target-cpu=native and such are stable.

In a nutshell, target-features are part of the call ABI, but Rust does not take into account, yet. For example, for an x86 target without SSE linking these two crates shows the issue:

// crate A
#[target_feature(enable = "sse")] pub fn foo(x: f32) -> f32 { x }

// crate B
extern "Rust" {
     fn foo(x: f32) -> f32;
}
unsafe { assert_eq!(foo(42.0), 42.0) }; // UB

The ABI of A::foo is "sse" "Rust" but the ABI declared in B::foo is just "Rust", no SSE, since the SSE feature is not enabled globally. That's an ABI mismatch and results in UB of the form "calling a function with an incompatible call ABI". For this particular case, A::foo expects the f32 in an xmm register, while B::foo expects it in an x87 register, so the roundtrip of 42.0 via foo will return garbage.

Instead of using #[target_feature] on a per function basis, one can just use -C target-feature to cause this issue for a whole crate. And with -C target-cpu, you can cause these issues for multiple features simultaneously.

Now, most features are compatible, e.g., if you have a target like x86_64-unknown-linux-gnu, which enables SSE2, and link against it a crate that enables SSE3 on top, then there is no change in the calling convention, and everything works. But if you try to, e.g., use something like RUSTFLAGS="-C target-feature=+soft-float" to disable the use of hardware floating point, then if the standard library was compiled with hardware floating-point, every time you pass a float to libstd you'll get UB.

The real fix for this would be for rustc to consider target-features part of the function ABI (e.g. "sse2" "Rust" != "sse3" "Rust"), and then once it can detect that these two ABIs are different, generate shims to adapt them as necessary, e.g., just like we do for all other ABIs (and reject function pointer types). This is a lot of work. There are a couple of temporary fixes that we could introduce (e.g., we could... pass all floats by memory... just like we do for simd vectors...), but if cargo could automatically recompile libstd, then that would definitely fix the issue at least for -C target-feature/cpu which can be used without unsafe code. One could still introduce the issue with #[target_feature], but at least that's explicit in code, is unsafe to use, and not all features are available on stable (and disabling features is not allowed here either).

The issue with -C target-feature and target-cpu is that it may be a soundness issue, not a guaranteed soundness issue, which means that there's likely a lot of users already using it who don't need to recompile std. In that sense I think this the same that I mentioned above in that it runs a high risk of returning a false positive of "this is a situation where libstd must be built".

I think the fact of the matter is that we just do not have a great mechanism of automatically inferring "libstd must be built from source". I think that factors into the UX of how we want to enable this feature eventually in terms of it should likely be opt-in but should also likely be very lightweight to opt-in (like edition = "2018"). I think a CLI flag is likely too onerous for situations that want to rebuild std to get stabilized eventually.

I think the fact of the matter is that we just do not have a great mechanism of automatically inferring "libstd must be built from source".

Agreed.

I think that the appropriate fix for the soundness bugs is for rustc to error when crates with incompatible target-features are linked, hinting users that they should compile their code with compatible features.

The role of cargo would then be to provide nice workflows for doing that. Inferring incompatible target-features is tricky, so I don't think cargo should try to do that. I also don't think it should try to use any simple heuristics (e.g. "if RUSTFLAGS then recompile std"), because most users don't need that.

Maybe the .cargo/config.toml would be a good place for such a flag. Things like RUSTFLAGS can already be specified there. It would also then get an environment variable, so those using RUSTFLAGS="..." cargo build that want to recompile libstd could do that without having to edit the .cargo/config.toml by using the env var instead, e.g., CARGO_REBUILD_STD=1 RUSTFLAGS="..." cargo build or similar.

What's the status of this? The current requirement to pass the -Z build-std flag to every cargo invocation is very cumbersome, with the result that many projects keep using xargo/cargo-xbuild for now.

I know that there are a lot of details that need to be carefully worked out, but would it be possible to add some sort of unstable configuration key to the .cargo/config or Cargo.toml to just enable the -Z build-std flag permanently for a project?

.cargo/config seems alright as it's never published (to e.g. crates.io) so there is no compatability concerns.

I actually have the opposite thought. If a lib crate depends on getting a fresh libstd or libcore relying only on .cargo/config wouldn't seem to allow that.

ehuss commented

What's the status of this?

I implemented a prototype with explicit std dependencies in Cargo.toml, but it turns out to be really fussy and complex. We are exploring a different approach where the user does not need to list any crate names when enabling build-std. This requires some significant changes to the standard library so that it builds on all platforms, so we are still exploring the feasibility of the approach.