Support for macOS Universal/fat binaries
kornelski opened this issue Β· 40 comments
macOS (and iOS) has a concept of universal binaries which contain code for multiple CPU architectures in the same file. Apple is migrating from x86_64 to aarch64 CPUs, so for the next few years it will be important for macOS developers to build "fat" binaries (executables and cdylibs).
Apple's Xcode has very helpful default behavior for this: when building in release mode, it automatically builds for both x86_64 and aarch64 together. In the debug mode, like Cargo, Xcode builds only the current native architecture.
Could cargo --release
on macOS hosts automatically build both x86_64-apple-darwin
and aarch64-apple-darwin
, and merge them into a single executable? Merging requires running lipo -create -output universalbinary intelbinary armbinary
.
I think it support for universal binaries should be built-in in Cargo:
- For the next 3-5 years it will be a necessary operation for every macOS release build.
- Cargo lacks support for general-purpose post-linking steps (#545). That issue has been in limbo for years, but arm64 (M1) Macs have already shipped, and support for Universal binaries is needed right now.
- Even if Cargo did have post-build steps, it would be chore to re-add the same necessary step to every project.
- There's a huge value in
cargo build --release
working for projects out of the box. Without building Universal binaries this becomes half-built, and insufficient for macOS developers.
Seems reasonable to me to support! Cargo already had (unstable) support to build multiple targets at once, and it sounds like that's almost exactly what this wants (with just one final step). I think the first step here would be for a proposal to be made followed by an unstable implementation.
To add to this, its seems currently when trying to link a universal binary it will fail with the following error:
failed to add native library [library path] file too small to be an archive
It would be nice if if I could just link towards a universal binary and cargo would be able to link it. This will probably have a lot more edge cases since there are multiple ways too link a library. Currently the alternative is to add a build step using lipo -thin
and while this isn't too bad it is a chore too keep re-adding that step.
Could
cargo --release
on macOS hostsβ¦
Can we also add a way to cross-compile to a macOS universal binary?
I know of a few projects that build their macOS binaries on Linux - they would also want a way to build universal binaries.
Specifically:
- if their target is
x86_64-apple-darwin
, we can't change the output to a fat binary - If their target is
apple-darwin
, we might be able to change the output to a fat binary
Does specifying apple-darwin
actually do anything right now?
Is this issue about being able to link to fat libraries, or about producing fat libraries without having to manually call lipo post-build in cargo? Both are a problem, but the most pressing one is just being able to link against fat libraries without the "file too small to be an archive" error.
I've meant this as a feature request for building fat binaries. I think "failed to add native library" could be considered a bug/incompatibility, and handled separately.
@kornelski you can use cargo-lipo today to simplify the process, but you'll still be forced to link against thin libraries because of a long-time linker issue. The linker currently doesn't handle the fat library header correctly, which was a very annoying problem for iOS, but now I guess it's going to become an even bigger problem with macOS and the ARM transition. It was a problem for macOS when we still made fat libraries for intel 32-bit and 64-bit, but since 32-bit was dropped most macOS libraries have become thin libraries. This is no longer the case because of ARM64, so we'd really appreciate a fix for the linker as a first step to make this more pleasant :)
How to achieve this? In my opinion, it may require to build two binary files, and do some work after building is finished. Can we achieve this by adding a cargo script, to finish generating the output file after all binaries (in this case, same code but different target) are built?
How to achieve this? In my opinion, it may require to build two binary files, and do some work after building is finished. Can we achieve this by adding a cargo script, to finish generating the output file after all binaries (in this case, same code but different target) are built?
It is better to simply build once per architecture and then combine the single-arch binaries (thin) into a multi-arch binary (fat). See my previous answer for how it can be done today using cargo-lipo, but there is also a long-standing linker bug that prevents linking against fat libraries.
I support this proposal and think that cargo build --release
should build for the same targets as xcodebuild -configuration Release
does on its stable version. Since Xcode 12.2, this is x86_64 and arm64.
Xcode 12.2 and later automatically adds the arm64 architecture to the list of standard architectures for all macOS binaries, including apps and libraries. During the debugging and testing process, Xcode builds only for the current system architecture by default. However, it automatically builds a universal binary for the release version of your code.
At some point Xcode will remove x86_64 from their list of standard release architectures, and at that point (or a certain period after) cargo build --release
should be updated to reflect that and only build for arm64.
I wonder at which level it should be done in Cargo, given that Rust has a concept of a target, and obviously it'd be very weird if x86_64-apple-darwin
target built things for ARM (and ARM-only eventually).
I don't think universal-apple-darwin
would make sense to exist as a Rust target, because it completely doesn't fit what Rust considers a target.
So I suppose all the universal magic would have to be limited to invocation of cargo build
/run
pretty early, at a high level, so that Cargo itself would change it to be equivalent to cargo build --target=x86_64-apple-darwin
+ cargo build --target=aarch64-apple-darwin
running together.
@kornelski I second this, we shouldn't add a "universal" target, because it just makes everything much more complicated, without considering the fact that "universal" is not even a target, and could mean more than one thing (it is a combination of multiple targets, but we don't know which ones).
I think the current cargo-lipo solution could simply be ported directly into cargo instead of being a separate tool: basically keep building for one target at a time, but make it possible to call cargo telling it to build for multiple targets and produce universal binaries. Under the hood all it does is build for each target + combine the multiple thin binaries into a single fat binary using lipo.
The only downside is that this won't solve the problem of cross-compiling, but there are many other issues that make this quite difficult anyway, starting by the availability of tools like "lipo" outside of Xcode on macOS. This tool is open sourced by Apple as part of the cctools package, but they never bothered porting it to other platforms themselves. There exists a Linux port of cctools with a copy of lipo I have successfully used myself: https://github.com/tpoechtrager/cctools-port
I don't think we need to go through all this trouble, a first improvement to include the functionality from cargo-lipo directly into cargo while relying on the presence of a "lipo" command-line tool (we could support it on Linux if you install the cctools-port) would already be more than good enough.
FYI, I have implemented a "lipo" like crate recently: https://github.com/messense/fat-macho-rs
I think we may have circled all the way back to @ alexcrichton's initial response at the top of the issue. #8176 is what he was referring to by existing unstable multi target support. You would just need a way to tell cargo to add the build step afterwards.
I don't think it should be a default for all release builds on macOS. While you can run benchmarks with cargo, perhaps the most common reason for building in release mode is to test speed. I don't want to double the LLVM codegen time and linking time which is already significantly slower on macOS (ld64) without lld support for mach-o, just so people don't have to change their dist build script to add some flag. I'm pretty sure every single distribution of code to macOS has had to touch their release scripts.
Raw suggestion which is quite extensive and a lot of work probably? But have a look:
[lib]
...
[lib.targets.macos-universal]
crate-type = ["...", "..."]
targets = ["x86_64-apple-darwin", "aarch64-apple-darwin"]
# and either
post-build = "lipo_universal.rs"
# runs lipo on the outputs that it reads from env variables.
# Pretty easy to write this.
# Then you can use this for arbitrary post-processing including strip, etc...
# or supply a list of builtin post-processing steps.
post-build = ["lipo"]
# or both! Cargo's builtins run first. Need to name them differently. Or recognise file names ending with .rs and have them all in the one list.
Some questions/thoughts:
- do we need this outside the lib target configuration? Probably yes, the bin target as well. Do you want to share those two? Probably fine not so DRY but fine. You likely want them to be different anyway (strip binaries, not libraries...)
- is making people write a rust build script to execute lipo too much effort? Probably yes, so probably need a builtin. But I like that you can let people use their own tool for it. Xcode actually doesn't call lipo -- it uses libtool (not the GNU one) to do this job.
- obviously doesn't apply to rlib crate-type outputs, which you can't do post processing on.
[lib.targets.X.targets]
is dumb. Please bikeshed this for me.- big remaining question is how you activate this custom configuration from the command line.
- do you add these fields to the [lib] section directly as well? Probably yes?
Ok big pie on the sky idea here, but:
- can you use this to implement wasm-pack/ wasm-bindgen's post processing????
- ??? I think so! Why not?
- probably need a way for build-dependencies to extend the list of builtin post processing steps. Otherwise not ergonomic enough. That seems like a lot of work? Unless all they need to do is have a binary target that is executed as if it's a normal post-build script.
- there are, however, a lot of cargo wrappers out there that could really just work in this position.
Ok, I'll stop there.
Post build script is a more general way. On other platforms we also need a way to fuse code from different architecture, or we can do more work like what we already did in build.rs.
Some further comments on reflection:
- Post build scripts via crates with binaries would eliminate the need for builtin steps like "lipo". Just have a crate that runs lipo and let the community figure it out.
- Re wasm-bindgen, this solves the problem that wasm-pack was invented to solve, namely having an on-demand binary wasm-bindgen-cli with the exact same version number as the wasm-bindgen whose macros were used in the crate. Using the crate graph to do this seems good. (Except... oh wait... resolver V2 kills this guarantee! That's ok, at least the CLI/new post-build package can check an exported version string from its dep and tell you when there's a mismatch.)
- The only reason you need multiple named configurations like
macos-universal
is so you can avoid doing universal builds all the time in release mode. Xcode gets around this in two ways. One is a "Build Active Architecture Only" flag set by default only in debug mode. The second is that you can create another Configuration for profiling, and set that flag for it, not affecting release builds. (And then select the profiling configuration in a Scheme.) Cargo doesn't have user defined named configurations. This would introduce a form of them, without adding any actual debug/release configurations that would spiral complexity into rustc. - If you didn't care about that you could just have a setting similar to Xcode's "Build Active Architecture Only", and only have these options in [lib], not selectable configurations of [lib]. But this means now you can only have one platform's universal binary configuration per Cargo.toml, unless you make it all overridable by CLI flags. Does that set some kind of default target? Seems bad. Maybe it would be good if a Cargo.toml could say "default target wasm32-unknown-unknown please"? But I'm unconvinced in any other case; almost nobody building a static library for macOS wants to make macOS the default even on Linux.
- This ought to be powerful enough to have a build script that creates an .xcframework. Xcframeworks work by having LLVM embed platform information into the binary and then reading it when assembling the .xcframework. This is carried through the LLVM target triple (
aarch64-apple-ios11.0-simulator
) but Rust targets do not carry this information (just-ios
) in the triple. All you need to change is the LLVM target triple (currently possible using JSON targets on nightly) and rustc can then produce the right metadata. So as long as you can specify JSON target filenames in your list, then that can work today (on nightly). (To be clear, this is not a great idea, whoever wants an .xcframework for easy importing probably also wants a Swift interface. But also thanks @messense for the idea -- using goblin can probably avoid all this LLVM target triple stuff and work on stable by simply rewriting the version min load commands.) - if that works (building the entire thing up to what, 10 times?), then you really want it to be optional in release mode.
- Another use case is cbindgen.
- You probably want some CLI options for this anyway. At least such that you can call one of these configurations with different targets.
I think involving custom post-build scripts here is a mistake. This problem is not a custom job, it's a common requirement for an entire platform.
Note that the baseline for this is something like:
build.sh
:
cargo build --target=aarch64-apple-darwin
cargo build --target=x86_64-apple-darwin
lipo -create foo target/aarch64-apple-darwin/release/foo target/x86_64-apple-darwin/release/foo
so if users had to write custom TOML config or custom scripts in Rust that implement the same thing, it wouldn't simplify anything.
And trying to solve all the problems of wrangler, cbindgen, IDE integration, etc. at the same time will mean this issue will be paralyzed by additional incompatible requirements, scope creep, and won't get done (at least not before Apple drops x86 support making it moot ;)
For configuration, I suggest:
--release
defaults to universal binary- debug build defaults to current target only
It could be controlled with:
cargo build --no-apple-universal --release
cargo build --apple-universal
and
[profiles.dev]
apple-universal = true
[profiles.release]
apple-universal = false
or apple-universal = ["aarch64", "x86-64"]
, but this may be unnecessary, since currently there are no other archs that Apple supports. It could be added later in case Apple decided to change their CPUs once again :)
The lipo
workflow for creating fat universal .framework
s has been deprecated at WWDC'19 in favor of .xcframework
s:
lipo \
-create \
<PATH> \
<PATH> \
-output <PATH>
xcodebuild \
-create-xcframework \
-framework <PATH> \
-framework <PATH> \
-output <PATH>
AFAIK xcframeworks are not for releases for end-users to use, but for XCode's package manager and bundling libraries and header files together. At best they're like fancy .app
bundle, not a binary. Cargo builds binaries.
BTW: it's super annoying that Apple doesn't announce their OS changes in writing, and WWDC developer marketing videos are used instead of written changelogs and technical documentation.
@kornelski That's correct indeed.
But since .xcframework
is the only way to make a swift package depend on a (.dylib
or .a
) library (such as those produced by cargo), I'd argue that whether .xcframework
is a simple binary format or actually much more than a mere binary (which was the case for .framework
, too) does not really matter.
The standard method for distributing libraries for iOS/macOS has always been .framework
, not .a
or .dylib
, for what it's worth.
@kornelski So what is the procedure to make a universal binary right now?
@kornelski I'm also interested to know the answer to this question.
@Voodlaz @fdv1 the simplest way to produce a universal binary right now is to use a tool like cargo-lipo which builds once for each target and then calls lipo to merge the binaries together. You can also call lipo directly to do the same, if you need lipo on non-macOS, it's possible to either use a Linux port of the cctools or use llvm-lipo which is not built in regular clang+llvm distributions. I have both of these in my own clang+llvm builds if anyone needs them.
@awakecoding but cargo-lipo is deprecated in favor of doing it through Xcode, as it's written on the github page. So that's why I'm asking what is the procedure to make a universal binary RIGHT NOW(bad pharsing ig, should've used currently). Also, cargo-lipo if I'm not being wrong, is for ios. Will be there any problems with using it for Mac OS?
cargo lipo
doesn't work on executables, last I checked. It only does staticlib
s, so it's not really a universal solution.
@Voodlaz got a link to the github page? Even if Xcode has a list of targets to build for, only AppleClang has built-in support for those, a toolchain like Rust needs to build once for every architecture and then merge with lipo. The original lipo (and the portable replacements, like cctools for Linux and llvm-lipo) should work just fine on static libraries, shared libraries and executables. Maybe cargo-lipo needs some love, but calling lipo should work on macOS, iOS, and for all binary types.
I haven't use cargo lipo in a while, since we ended up doing single-architecture builds that we merge using lipo post-build for all of our Rust projects (both macOS and iOS). The issue with cargo-lipo is that the developer only cared for iOS when in fact it's the same thing for macOS. You don't need Xcode at all to produce universal binaries, the only thing you need is a functional lipo command-line tool and some sort of wrapper script to do the merge post-build. cargo-lipo was just a wrapper to do it through cargo, but it is apparently unmaintained now.
@awakecoding Okay
I've updated cargo xcode to support lipo
for both iOS and macOS.
Was a separate issue ever filed for the "unable to link against a universal static archive"?
If not, I'll file one but I want to make sure I didn't miss one in the thread!
Anything new here?
On deeper investigation the existing rustc bug (rust-lang/rust#55235) is probably the best place to track this -- ability to link static .a
s into libraries seems to be a rustc bug/feature, not a cargo one.
@randomairborne came up with a simple tool to build universal binaries:
https://github.com/randomairborne/cargo-universal2
For a quick fix, the Golang rewrite of lipo
is great for running locally or in a GitHub Action. The latter's particularly interesting because MacOS runners are expensive.
Tools like cargo-lipo, and Tauri's bundler use lipo
behind the scenes for commands like tauri build --target universal-apple-darwin
. Tauri's bundler is a fork of cargo-bundle, which doesn't support universal binaries.
I'm considering RIIR lipo
since it was open sourced and adding it to cargo-bundle. Thoughts?
I think llvm-lipo is also another alternative to macOS lipo.
This is a year late, but xcframeworks are for bundling multiple platforms together (like simulator and device, or macOS and iOS). Universal binaries are still the recommended way to have multiple architectures for one platform.
To add to this, it would be nice if the universal binary only included a single copy of include_bytes!
and include_str!
data. Two copies are placed into the universal binary when you use the lipo
command which seems wasteful.
To add to this, it would be nice if the universal binary only included a single copy of
include_bytes!
andinclude_str!
data. Two copies are placed into the universal binary when you use thelipo
command which seems wasteful.
Intriguing. Do you know of a place where I can read more about this?
Well the universal binary is basically an archive containing multiple binaries for target architectures. Given that each architecture is compiled separately and only at the end they are bundled, it's not surprising that the bytes are included twice. You can easily extract separate binaries from the universal one and they are completely standalone for given architecture