ziglang/zig

Regression: Cross-compilation from macOS to other platforms fails in libc

alexrp opened this issue ยท 21 comments

See these logs:

Some examples:

ZIGCOMPILE : error : unable to build glibc shared objects: FileNotFound [/Users/runner/work/zig-msbuild-sdk/zig-msbuild-sdk/src/samples/cxxexe/cxxexe.cxxproj]
ZIGCOMPILE : error : unable to build glibc CRT file: BuildingLibCObjectFailed [/Users/runner/work/zig-msbuild-sdk/zig-msbuild-sdk/src/samples/cxxexe/cxxexe.cxxproj]
ZIGCOMPILE : error : unable to build mingw-w64 CRT file: BuildingLibCObjectFailed [/Users/runner/work/zig-msbuild-sdk/zig-msbuild-sdk/src/samples/cexe/cexe.cproj]

Hi, thank you for the bug report.

Zig 0.8.1

Please verify that the problem still occurs with master branch, and then I will re-open this issue.

@andrewrk I only just now got around to checking this again. Still happens with 0.9.0.

https://github.com/alexrp/zig-msbuild-sdk/actions/runs/1604452176

ReleaseFast

ZIGCOMPILE : error : unable to build glibc CRT file: BuildingLibCObjectFailed [/Users/runner/work/zig-msbuild-sdk/zig-msbuild-sdk/src/samples/cxxexe/cxxexe.cxxproj]
ZIGCOMPILE : error : unable to build glibc shared objects: FileNotFound [/Users/runner/work/zig-msbuild-sdk/zig-msbuild-sdk/src/samples/cxxexe/cxxexe.cxxproj]
ZIGCOMPILE : error : unable to build mingw-w64 CRT file: BuildingLibCObjectFailed [/Users/runner/work/zig-msbuild-sdk/zig-msbuild-sdk/src/samples/cexe/cexe.cproj]
ZIGCOMPILE : error : unable to build mingw-w64 CRT file: BuildingLibCObjectFailed [/Users/runner/work/zig-msbuild-sdk/zig-msbuild-sdk/src/samples/cexe/cexe.cproj]
ZIGCOMPILE : error : unable to build mingw-w64 CRT file: BuildingLibCObjectFailed [/Users/runner/work/zig-msbuild-sdk/zig-msbuild-sdk/src/samples/cxxexe/cxxexe.cxxproj]

ReleaseSmall

ZIGCOMPILE : error : unable to build glibc CRT file: BuildingLibCObjectFailed [/Users/runner/work/zig-msbuild-sdk/zig-msbuild-sdk/src/samples/cxxexe/cxxexe.cxxproj]
ZIGCOMPILE : error : unable to build glibc shared objects: FileNotFound [/Users/runner/work/zig-msbuild-sdk/zig-msbuild-sdk/src/samples/cxxexe/cxxexe.cxxproj]

@andrewrk can this be reopened? (My understanding is that 0.9.0 was basically equivalent to master when I tested right after it was released.)

Can you suggest how to reproduce this issue? What commands cause those errors to be produced?

It should be reproducible like this without installing anything .NET-related:

$ git clone git@github.com:alexrp/zig-msbuild-sdk.git
$ cd zig-msbuild-sdk/src/samples/cexe
$ ZIG_LOCAL_CACHE_DIR=obj/Debug/linux-x64/zig-cache zig cc -fdiagnostics-format=msvc -target x86_64-linux.3.10-gnu.2.17 -fPIE -Og -g -std=gnu2x -fexceptions -fno-strict-aliasing -Wanon-enum-enum-conversion -Wassign-enum -Wcompletion-handler -Wconditional-uninitialized -Wdeprecated -Wextra -Wformat-pedantic -Wformat-type-confusion -Wimplicit-fallthrough -Wkeyword-macro -Wloop-analysis -Wover-aligned -Wshadow-all -Wswitch-enum -Wall -Warray-bounds-pointer-arithmetic -Wc++-compat -Wcast-align -Wcast-qual -Wcomma -Wfloat-equal -Wpointer-arith -Wshift-sign-overflow -Walloca -Wnon-gcc -Wsigned-enum-bitfield -Werror=newline-eof -Wconsumed -Wnullable-to-nonnull-conversion -Wthread-safety -Werror=date-time -no-canonical-prefixes -fdebug-compilation-dir . -fno-lto -D ZIG_MAJOR=0 -D ZIG_MINOR=9 -D ZIG_PATCH=0 -D "ZIG_VERSION=\"0.9.0\"" -D ZIG_RID_LINUX_X64 -D ZIG_CPU_X86_64 -D ZIG_BIT_64 -D ZIG_OS_LINUX -D ZIG_ABI_GNU -D ZIG_BO_LITTLE -D ZIG_LIB_GLIBC -D "ZIG_RID=\"linux-x64\"" -D "ZIG_CPU=\"x86_64\"" -D ZIG_BIT=64 -D "ZIG_OS=\"linux\"" -D "ZIG_ABI=\"gnu\"" -D "ZIG_BO=\"little\"" -D "ZIG_LIB=\"glibc\"" -D ZIG_CFG_DEBUG -D "ZIG_CFG=\"Debug\"" -D "ZIG_PKG_AUTHORS=\"cexe\"" -D "ZIG_PKG_COPYRIGHT=\"\"" -D "ZIG_PKG_DESCRIPTION=\"\"" -D "ZIG_PKG_LICENSE=\"\"" -D "ZIG_PKG_NAME=\"cexe\"" -D "ZIG_PKG_PRODUCT=\"cexe\"" -D "ZIG_PKG_REPOSITORY=\"\"" -D "ZIG_PKG_VERSION=\"1.0.0\"" -D "ZIG_PKG_WEBSITE=\"\"" -I . -include prelude.h -Wl,-rpath,'$ORIGIN' main.c -o obj/Debug/linux-x64/cexe -gen-cdb-fragment-path obj/Debug/linux-x64/cdb

(I don't have a macOS machine to test directly on, but the command that gets executed by the SDK should look like the above.)

I'm sure a lot of these flags aren't necessary to reproduce the issue, but without a macOS machine, I can't easily narrow it down. It's worth noting that the sample projects get built in parallel so it's not unthinkable that this might be some sort of cache issue (although as you can see, every build gets a separate ZIG_LOCAL_CACHE_DIR for the configuration/platform combination being targeted, so maybe not).


If instead you want to reproduce with .NET 6 installed, as is done on the CI run linked earlier, you can do the following:

$ git clone git@github.com:alexrp/zig-msbuild-sdk.git
$ cd zig-msbuild-sdk
$ dotnet build
$ dotnet build src/samples -c Debug -v:n # or Release

This will build all the samples in parallel for every target triple supported by the SDK, in either Debug or Release mode. The -v:n flag ensures that all zig invocations are logged in full along with their output. (This reminded me that I needed to do that for my CI runs.)

@alexrp Hi Alex! I've tried reproing your issue locally on my Intel MBP but everything build fine - I used the first 3 commands you listed. One observation though: is it on purpose that your command targets x86_64-linux or should it actually be x86_64-windows instead?

@kubkon I'm not sure I follow; why x86_64-windows specifically?

If you take a look through the logs posted above (#100, #101, #106, #111), you'll see that there are various cross-compilation issues going from macOS to any other platform (Windows/Linux, x86/x86-64/ARM64...). These didn't show up in 0.8.0.

I've kicked off a few extra builds (with vezel-dev/zig-sdk@3596f4a in place) to hopefully get some more useful logs as the failures seem to be somewhat random: #112, #113, #114

@alexrp oh, ok, so could you try with latest master instead of 0.9.0? Here's the download link to the latest nightly: https://ziglang.org/builds/zig-macos-x86_64-0.10.0-dev.665+f0400ad93.tar.xz It didn't repro for me locally with latest master and I'm curious if it will be the same for you. Also, while we are here, would you mind explaining how to use dotnet? In paticular, where do I set the path to the zig binary?

could you try with latest master instead of 0.9.0?

I can try to find some time to set up a repo to test it on GitHub Actions; I don't have a macOS machine, so I'll need to hack together a workflow to do it.

In paticular, where do I set the path to the zig binary?

The toolsets set ZigExePath to tell the SDK where to find zig / zig.exe.

To override that ZigExePath definition from the command line, you would do something like:

$ dotnet build src/samples -c Debug -v:n -p:ZigExePath=/path/to/zig

(You don't need to override the other variables, such as ZigLibPath, as these aren't needed/used by the SDK. They're just there in case users of the toolset packages might need them.)

Thanks @alexrp! So as pointed above, I did run the commands you posted and no error for me with latest zig on x86_64-macos, so perhaps this regression has been fixed in latest master, hence why I'd like you to confirm this also works for you with latest master. Then, I'll start bisecting which commit fixed it and cherry-pick that into 0.9.1.

I'm still interested in working with you @alexrp to solve this problem but we're so far unable to reproduce the issue on our side of things, so I'm going to lower the severity of this.

If we can get the problem confirmed on our end with a test case then I'll bump it back up.

I encountered a similar issue here with zig 0.9.1: https://github.com/messense/cargo-zigbuild/runs/5321183640?check_suite_focus=true

  = note: warning: unsupported linker arg: --disable-auto-image-base
          error(compilation): /usr/local/lib/python3.9/site-packages/ziglang/lib/libc/mingw/math/x86/remainderl.S:1:1: unable to build C object: FileNotFound
          error(compilation): /usr/local/lib/python3.9/site-packages/ziglang/lib/libc/mingw/misc/mingw-aligned-malloc.c:1:1: unable to build C object: FileNotFound
          error(compilation): /usr/local/lib/python3.9/site-packages/ziglang/lib/libc/mingw/math/erfl.c:1:1: unable to build C object: FileNotFound
          warning(module): unable to save cached ZIR code for /usr/local/lib/python3.9/site-packages/ziglang/lib/std/special/ssp.zig to /Users/runner/.cache/zig/z/498057bc0df0132aa91239434ad18d6f: FileNotFound

  = note: warning: unsupported linker arg: --disable-auto-image-base
          error(compilation): /usr/local/lib/python3.9/site-packages/ziglang/lib/libc/mingw/math/x86/tanf.c:1:1: unable to build C object: FileNotFound
          error(compilation): /usr/local/lib/python3.9/site-packages/ziglang/lib/libc/mingw/stdio/ulltow.c:1:1: unable to build C object: FileNotFound
          error: unable to build mingw-w64 CRT file: BuildingLibCObjectFailed

I can't find a consistent way to trigger this issue, but I am also seeing the errors in @alexrp's original comment occasionally.

In my case I'm using Zig as a cross-compiler for Golang's cgo, and it seems to only be triggered when multiple targets are being compiled at once, but it's not predictable.

Sample error messages from my project (this is from one build job where two targets were being compiled at once):

# runtime/cgo

error(compilation): /Users/runner/work/docker-client/docker-client/golang-wrapper/build/tools/zig-0.9.1/lib/libc/mingw/math/sf_erf.c:1:1: unable to build C object: FileNotFound
error(compilation): /Users/runner/work/docker-client/docker-client/golang-wrapper/build/tools/zig-0.9.1/lib/libc/mingw/complex/cargf.c:1:1: unable to build C object: FileNotFound
warning(module): unable to save cached ZIR code for /Users/runner/work/docker-client/docker-client/golang-wrapper/build/tools/zig-0.9.1/lib/std/special/ssp.zig to /Users/runner/.cache/zig/z/498057bc0df0132aa91239434ad18d6f: FileNotFound

# runtime/cgo
error(compilation): /Users/runner/work/docker-client/docker-client/golang-wrapper/build/tools/zig-0.9.1/lib/libc/mingw/crt/crtexe.c:1:1: unable to build C object: FileNotFound
error: unable to build mingw-w64 CRT file: BuildingLibCObjectFailed

Another example:

# runtime/cgo
error(compilation): /Users/runner/work/docker-client/docker-client/golang-wrapper/build/tools/zig-0.9.1/lib/libc/mingw/crt/crtexe.c:1:1: unable to build C object: FileNotFound
error: unable to build mingw-w64 CRT file: BuildingLibCObjectFailed
# runtime/cgo
error(compilation): /Users/runner/work/docker-client/docker-client/golang-wrapper/build/tools/zig-0.9.1/lib/libc/mingw/complex/conj.c:1:1: unable to build C object: FileNotFound
error: unable to build mingw-w64 CRT file: BuildingLibCObjectFailed

Is there some additional logging I can enable that would help you diagnose what's going on here?

This is what I've found so far:

  • it seems to only affect compiling Windows targets from macOS hosts
  • it seems to only happen on a clean machine (ie. with no Zig-related caches present)
  • it seems to only happen when multiple Windows targets are compiled at once
  • it seems to be related to the global cache: setting the ZIG_GLOBAL_CACHE_DIR environment variable to a different directory for each target seems to resolve the issue

Thanks for digging further into this @charleskorn. I've been swamped lately and haven't had time to. โ˜น๏ธ

  • it seems to only affect compiling Windows targets from macOS hosts

FWIW, in my logs above, it happened when cross-compiling to both Windows and Linux...

  • it seems to only happen when multiple Windows targets are compiled at once

... but maybe the reason why the Linux builds failed is that multiple Windows builds are running at the same time? In my case, I build a whole bunch of projects for a whole bunch of targets, all in parallel:

https://github.com/vezel-dev/zig-msbuild-sdk/blob/d33f80b20bcb705457ab4f5f06389f1200749f82/src/sdk/build/Zig.Sdk.Defaults.targets#L68-L77

  • it seems to only happen on a clean machine (ie. with no Zig-related caches present)
  • it seems to be related to the global cache: setting the ZIG_GLOBAL_CACHE_DIR environment variable to a different directory for each target seems to resolve the issue

This is an interesting find. I honestly wasn't even aware that there is a separate global cache even when a local cache directory is set. (#11394 is very relevant here.) I'll try to experiment with non-shared ZIG_GLOBAL_CACHE_DIRs and see if that works around the issue in my case.

'Good' news: Setting ZIG_GLOBAL_CACHE_DIR to a build-local directory (vezel-dev/zig-sdk@af73eb3) seems to have worked around the issue in my case as well: https://github.com/vezel-dev/zig-sdk/actions/runs/2107172418

Not ideal as it hurts build times quite a bit (50-60%) but at least there are no more random CI failures. I guess we can safely conclude that there was some kind of caching regression from 0.8.0 to 0.8.1.

Having this same issue when cross-compiling on darwin with nix:
https://gist.github.com/Cloudef/acb74ff9e36ab41709479240596ab501

The cache dirs do not seem to matter. This happens very often when building bzip2
https://github.com/NixOS/nixpkgs/blob/master/pkgs/tools/compression/bzip2/default.nix#L50
I have feeling it has something to do with parallel builds (build system spawning multiple zig cc's)

Finally got around to checking this again. Kicked off a bunch of builds using Zig 0.13.0 with the ZIG_GLOBAL_CACHE_DIR workaround reverted: https://github.com/vezel-dev/zig-sdk/actions/workflows/build.yml?query=branch%3Awork%2Frevert-cache-workaround

Based on the results so far, I'm inclined to say that this issue has since been fixed. But that's just on my end. @messense @charleskorn @Cloudef @eatonphil @batiati would you all be able to try and see if you can still reproduce this?

I am going to close this on the assumption that it has since been fixed.

@messense @charleskorn @Cloudef @eatonphil @batiati if you are still able to repro this, please comment and I'll reopen.

Hi @alexrp!
Yes, I can confirm it has been fixed!

We removed the workaround when we upgraded to Zig 0.13:
tigerbeetle/tigerbeetle@f06a804