ziglang/zig

Possibility to disable caching for user code

kmicklas opened this issue · 7 comments

Zig Version

0.10.0-dev.2783+76546b3f8

Steps to Reproduce

Build a very large binary with zig cc as the C toolchain. In our case, the binary is 584M.

Expected Behavior

Currently it is expected that the Zig cache may grow very large. However, when using an external build system to orchestrate the build and cache the results (in our case Bazel), we have no need for the Zig cache outside "internal" builds (libc stubs etc.). It would be great to have a flag to only cache these internal builds, and not user code.

Actual Behavior

With our 584M binary, the Zig cache is 761M after building from a clean cache.

To add some light on the issue: when building some subtrees in Uber's go monorepo, the zig-cache promptly grows to hundreds of gigabytes, mostly duplicated with Bazel's cache. This may not be much for a single workstation, but that multiplies to a large number of engineers (which adds up to costs to CI and dev machines).

I think another route here could be for Zig to understand how to interact with Bazel's (or other) cache system(s) so that it can still know when to not do work and save on the cpu cycles, not just disk space

@nektro In our case I believe just not caching user code is sufficient. Bazel already caches user builds at the level of granularity that we care about. However we wouldn't want to turn off Zig's caching entirely because then we would waste cycles rebuilding smaller units that Bazel doesn't know about.

There may be another way: instead of copying, try to reflink the files. If the setup is "careful enough", it will be faster and just save space. The "careful enough" is:

  • supported file system. As of 5.19 that's btrfs, cifs, nfs, ocfs2, overlayfs and xfs.
  • zig cache and the target directories are on the same mount point.

I will investigate this a bit more.

There may be another way

Correction: it won't solve the Bazel caching issue fully, because bazel clean will not wipe that cache (as expected by Bazel's UX).

Copy-with-reflink, when available, is IMO a worthwhile optimization nevertheless.

eb3f7d2 partially fixes it. However, it still uses cache for commands that do zig ld.lld.

E.g. this command produces a huge binary:

env -
    CGO_ENABLED=1 \
    ZIG_VERBOSE_LINK=1 \
    GOARCH=amd64 \
    GOOS=linux \
    GOPATH=''  \
    GOROOT=external/go_sdk \
    GOROOT_FINAL=GOROOT \
    PATH=/home/user/.cache/bazel/_bazel_motiejus/b97476d719d716accead0f2d5b93104f/external/zig_sdk/tools:/bin:/usr/bin \
    bazel-out/k8-opt-exec-2B5CBBC6/bin/external/go_sdk/builder \
    '-param=bazel-out/k8-fastbuild/bin/src/long-path_/big-binary-0.params' \
    -- \
    -extld /home/user/.cache/bazel/_bazel_motiejus/b97476d719d716accead0f2d5b93104f/external/zig_sdk/tools/c++ \
    '-buildid=redacted' \
    -extldflags '-target x86_64-linux-gnu.2.19 -lc++abi -Wl,--version-script,/home/user/.cache/bazel/_bazel_motiejus/b97476d719d716accead0f2d5b93104f/external/zig_sdk/glibc-hacks/fcntl.map -fno-lto -Wl,-S'

Here is the output of zig ld.lld, thanks to ZIG_VERBOSE_LINK:

ld.lld -error-limit=0 -O3 -z stack-size=16777216 --gc-sections --eh-frame-hdr --export-dynamic -s -znow -m elf_x86_64 -o /tmp/bazel-zig-cc/o/19a3b4300b8c9d527d64a608f66d4a8c/big-binary /tmp/bazel-zig-cc/o/3e9807437e08d3a4223353cc97d6de44/Scrt1.o /tmp/bazel-zig-cc/o/58b296b5edb38c89a90b612913f64ee6/crti.o -dynamic-linker /lib64/ld-linux-x86-64.so.2 /tmp/go-link-800448214/go.o /tmp/go-link-800448214/000000.o /tmp/go-link-800448214/000001.o /tmp/go-link-800448214/000002.o /tmp/go-link-800448214/000003.o /tmp/go-link-800448214/000004.o /tmp/go-link-800448214/000005.o /tmp/go-link-800448214/000006.o /tmp/go-link-800448214/000007.o /tmp/go-link-800448214/000008.o /tmp/go-link-800448214/000009.o /tmp/go-link-800448214/000010.o /tmp/go-link-800448214/000011.o /tmp/go-link-800448214/000012.o /tmp/go-link-800448214/000013.o /tmp/go-link-800448214/000014.o /tmp/go-link-800448214/000015.o /tmp/go-link-800448214/000016.o /tmp/go-link-800448214/000017.o /tmp/go-link-800448214/000018.o /tmp/go-link-800448214/000019.o /tmp/go-link-800448214/000020.o /tmp/go-link-800448214/000021.o /tmp/go-link-800448214/000022.o /tmp/go-link-800448214/000023.o /tmp/go-link-800448214/000024.o /tmp/go-link-800448214/000025.o /tmp/go-link-800448214/000026.o /tmp/go-link-800448214/000027.o /tmp/go-link-800448214/000028.o /tmp/go-link-800448214/000029.o /tmp/go-link-800448214/000030.o /tmp/go-link-800448214/000031.o /tmp/go-link-800448214/000032.o /tmp/go-link-800448214/000033.o /tmp/go-link-800448214/000034.o /tmp/go-link-800448214/000035.o /tmp/go-link-800448214/000036.o /tmp/go-link-800448214/000037.o /tmp/go-link-800448214/000038.o /tmp/go-link-800448214/000039.o /tmp/go-link-800448214/000040.o /tmp/go-link-800448214/000041.o /tmp/go-link-800448214/000042.o /tmp/go-link-800448214/000043.o /tmp/go-link-800448214/000044.o /tmp/go-link-800448214/000045.o /tmp/go-link-800448214/000046.o /tmp/go-link-800448214/000047.o /tmp/go-link-800448214/000048.o /tmp/go-link-800448214/000049.o /tmp/go-link-800448214/000050.o /tmp/go-link-800448214/000051.o /tmp/go-link-800448214/000052.o /tmp/go-link-800448214/000053.o /tmp/go-link-800448214/000054.o /tmp/go-link-800448214/000055.o /tmp/go-link-800448214/000056.o /tmp/go-link-800448214/000057.o /tmp/go-link-800448214/000058.o /tmp/go-link-800448214/000059.o /tmp/go-link-800448214/000060.o /tmp/go-link-800448214/000061.o /tmp/go-link-800448214/000062.o /tmp/go-link-800448214/000063.o /tmp/go-link-800448214/000064.o /tmp/go-link-800448214/000065.o /tmp/go-link-800448214/000066.o /tmp/go-link-800448214/000067.o /tmp/go-link-800448214/000068.o /tmp/go-link-800448214/000069.o /tmp/go-link-800448214/000070.o /tmp/go-link-800448214/000071.o /tmp/go-link-800448214/000072.o /tmp/go-link-800448214/000073.o bazel-out/k8-fastbuild/bin/external/icu4c/libicu-base.a bazel-out/k8-fastbuild/bin/external/icu4c/libicudata.a bazel-out/k8-fastbuild/bin/external/icu4c/libicu-base.a bazel-out/k8-fastbuild/bin/external/icu4c/libicudata.a bazel-out/k8-fastbuild/bin/external/icu4c/libicu-base.a bazel-out/k8-fastbuild/bin/external/icu4c/libicudata.a /tmp/bazel-zig-cc/o/b0efcd0d817d6d015557a550d36de22d/libcompiler_rt.a --as-needed /tmp/bazel-zig-cc/o/42b73741d9c28ff128d37346e17eca90/libc++abi.a /tmp/bazel-zig-cc/o/51177ea50d24d8d41e0445f6d80456ea/libc++.a /tmp/bazel-zig-cc/o/44d39e39167b448eb425d87cf7f8d683/libunwind.a /tmp/bazel-zig-cc/o/ca4fd105f3e12c84e58b663903099d18/libm.so.6 /tmp/bazel-zig-cc/o/ca4fd105f3e12c84e58b663903099d18/libpthread.so.0 /tmp/bazel-zig-cc/o/ca4fd105f3e12c84e58b663903099d18/libc.so.6 /tmp/bazel-zig-cc/o/ca4fd105f3e12c84e58b663903099d18/libdl.so.2 /tmp/bazel-zig-cc/o/ca4fd105f3e12c84e58b663903099d18/librt.so.1 /tmp/bazel-zig-cc/o/ca4fd105f3e12c84e58b663903099d18/libld.so.2 /tmp/bazel-zig-cc/o/ca4fd105f3e12c84e58b663903099d18/libutil.so.1 /tmp/bazel-zig-cc/o/ca4fd105f3e12c84e58b663903099d18/libresolv.so.2 /tmp/bazel-zig-cc/o/a17e47fdcaad26cf47d6a6ed1086e4d6/libc_nonshared.a /tmp/bazel-zig-cc/o/22f0776e1b498fa69f08affbc85f03c5/crtn.o --allow-shlib-undefined

Note -o /tmp/bazel-zig-cc/o/19a3b4300b8c9d527d64a608f66d4a8c/big-binary in the ld.lld command, and the size of this binary is 693 MB. If we build only a single target, the total zig cache size is 751MB; which is much better than it used to be, but still not desirable.

I am evaluating the following change:

--- a/src/main.zig
+++ b/src/main.zig
@@ -2110,7 +2110,6 @@ fn buildOutputType(
                 .link => {
                     output_mode = if (is_shared_lib) .Lib else .Exe;
                     emit_bin = if (out_path) |p| .{ .yes = p } else EmitBin.yes_a_out;
-                    enable_cache = true;
                     if (emit_llvm) {
                         fatal("-emit-llvm cannot be used when linking", .{});
                     }

I think this may be another case where adding new user input flags is not required, because with this change, zig cc -o foo foo.c would behave the same as zig build-exe foo.c -lc, including caching behavior.

If this change works out, then I believe this issue will be completely resolved; no changes needed on your end other than upgrading to a newer Zig version which has this patch.