ziglang/zig

directly output LLVM bitcode rather than using LLVM's IRBuilder API

andrewrk opened this issue ยท 17 comments

Zig can be built with or without -Denable-llvm. Currently, Zig is not very useful without enabling LLVM extensions. However, as we move into the future, Zig is intending to compete directly with LLVM, making builds of Zig without LLVM a compelling option for the backends directly supported by Zig.

There are a few reasons why one might want an LLVM-less binary:

  • The executable is 4.4 MiB instead of 169 MiB.
  • Bootstrapping it only requires a C compiler instead of requiring a modern C++ compiler, Python 3, bash, and CMake (also C++).
    • This would make it much easier to obtain a Zig compiler on a new operating system or a limited environment such as a calculator.

This proposal is to treat LLVM bitcode files (.bc) as the target output format, rather than going through the C++ IRBuilder API. This would make it possible for even non-LLVM-enabled builds of Zig to still output LLVM IR that could be consumed by Clang, other LLVM tools, or integrated with other software.

One example user story comes from Roc. I'd like to get @rtfeldman's take on this - I know that you're using Zig to output .bc files, but then what happens? Does a different tool compile that together with other code, or do you use Zig for the final link step too? I'm guessing that Roc would be able to use the non-LLVM-enabled Zig binaries for their use case.

There is a second major reason for this proposal, which is perhaps even the better argument in favor of it, which is to make incremental compilation work more robustly. As the Zig project moves forward, we want to make CacheMode.incremental the default for all backends including LLVM (caddbbc). This means we would want to save the LLVM IR module (.bc) with every compilation and restore it for subsequent compilations, using the IRBuilder API to add and remove declarations as necessary from the LLVM IR module, keeping the .bc file on disk in sync for future incremental compilations.

However... the API lacks functionality. For example, aliases cannot be deleted:

zig/src/codegen/llvm.zig

Lines 1330 to 1348 in e67c756

// TODO LLVM C API does not support deleting aliases. We need to
// patch it to support this or figure out how to wrap the C++ API ourselves.
// Until then we iterate over existing aliases and make them point
// to the correct decl, or otherwise add a new alias. Old aliases are leaked.
for (exports[1..]) |exp| {
const exp_name_z = try module.gpa.dupeZ(u8, exp.options.name);
defer module.gpa.free(exp_name_z);
if (self.llvm_module.getNamedGlobalAlias(exp_name_z.ptr, exp_name_z.len)) |alias| {
alias.setAliasee(llvm_global);
} else {
_ = self.llvm_module.addAlias(
llvm_global.globalGetValueType(),
0,
llvm_global,
exp_name_z,
);
}
}

If Zig were in control of outputting the .bc file instead, then Zig could simply not emit aliases that are not supposed to exist. We no longer are limited by what the IRBuilder API can do. This would make the LLVM backend very similar to the WebAssembly backend in the sense that it gains a linking component and directly outputs the module.

Finally, in the incremental compilation sense, Zig would already be trying to keep a .bc file on disk up-to-date via the IRBuilder API. Doing it directly instead of via a limited API is a more direct way to solve the problem, and the performance would be in our hands rather than in the hands of the LLVM project.

I think these two reasons combined make this proposal worth seriously considering, despite the downsides of taking on additional maintenance with LLVM upgrades, and introducing an entirely new class of bugs from generating malformed .bc files.

Another motivation for this would be to reduce memory usage - it seems the main culprit of using a lot of memory when using the LLVM backend is LLVM itself:

massif vizualization

I'm so exciting about this proposal.

One of the future direction can benefit from this is the new GlobalIsel pipeline, which I think is production ready for arm64.

By decoupling the llvm ir generation from the zig pipeline, we can target the GMIR directly, and take advantage of the years huge effort by Apple

I'd like to get @rtfeldman's take on this - I know that you're using Zig to output .bc files, but then what happens? Does a different tool compile that together with other code, or do you use Zig for the final link step too? I'm guessing that Roc would be able to use the non-LLVM-enabled Zig binaries for their use case.

Yep, I think we should be fine. Basically what we do for our optimized build is:

  • Our zig standard library gets compiled into .bc files
  • We load those up in LLVM to get our initial modules
  • We compile the user's .roc code directly into those LLVM modules
  • We tell LLVM to convert the modules to binary

So as long as Zig still supports emitting .bc files, we should be fine! ๐Ÿ‘

I just want to highlight that LLVM bitcode is not stable, so this will add friction for the user. Is the plan to still have integration with clang, but via shelling out rather than linking? Adding that either in the compiler or in build.zig would provide a smoother experience as it would ensure that the user does not need to deal with this versioning stuff.

I just want to highlight that LLVM bitcode is not stable, so this will add friction for the user. Is the plan to still have integration with clang, but via shelling out rather than linking? Adding that either in the compiler or in build.zig would provide a smoother experience as it would ensure that the user does not need to deal with this versioning stuff.

https://llvm.org/docs/DeveloperPolicy.html#ir-backwards-compatibility

you can emit outdated LLVM bitcode and get away with that, it's relatively stable in that way. "The current LLVM version supports loading any bitcode since version 3.0"

@Snektron you seem to be getting this issue confused with #16270. This issue, when implemented, will mean that Zig outputs .bc files compatible with the same version of LLVM that Zig links against, in memory, and then uses LLVMParseBitcodeInContext2 to convert that to an LLVMModuleRef rather than using the LLVMIRBuilder API. There will be no visible difference for users, except that it will be faster, and incremental compilation will work better. It means targeting the bitcode format as an ABI rather than targeting the LLVMIRBuilder API, greatly reducing our LLVM API surface area.

Does this mean we still rely on LLVM or an LLVM-compatible backend for machine code generation?

mlugg commented

Yes - this issue isn't related to moving away from LLVM, but simply an implementation detail in terms of how we emit LLVM IR (/bitcode). From the user perspective, this change should have no effect on the compiler's functionality.

Progress:

certik commented

@andrewrk I have a question regarding:

There will be no visible difference for users, except that it will be faster, ...

We are also considering using this bitcode approach for LFortran (lfortran/lfortran#2587), the benefits are clear, but I do not understand how can it be faster?

One one hand we have the C++ LLVM Builder API that constructs the internal LLVM IR representation in memory. On the other hand we are first creating a binary, then asking LLVM to parse it and then construct the internal LLVM IR representation in memory. If the first approach is implemented in the most efficient way possible, I think it must always be faster, isn't it?

Assuming the C++ LLVM Builder API is currently slow, so that it is faster to just create the binary .bc first and then let LLVM parse it, does it mean that if somebody writes a faster C++ LLVM Builder API it will be able to beat the .bc approach?

Here are some reasons I expect it to be faster this way:

  • Avoiding the overhead of the C++ LLVM Builder API
  • Avoiding the overhead of the C API on top of that
  • Creating C++ objects intertwined with the Zig compiler messing around with its own memory is harder on the CPU cache than doing each one serially. If LLVM loads a bitcode file, it should be able to order the objects sequentially and do things batched and more efficiently.
  • It can be done safely in a separate thread. LLVM has had a lot of bugs related to this since Clang does not do it.

I don't think it's possible for someone to write a faster C++ LLVM Builder API. I think they are limited by C++ and the object-oriented programming paradigm the entire LLVM codebase is built upon.

That said, this is all speculation. I could very well be wrong.

certik commented

Excellent thanks for the answer. Ok, I can see that there might be a way for it to be faster. It would be great if it is, that would simplify a lot of things.

It will probably not be difficult to create a simple benchmark: construct some simple (but long) function or expression using the C++ LLVM Builder API, vs first creating a bitcode file and loading into LLVM.

That's a great idea!

@certik looks like we have some performance data to look at in #19031. @antlilja reports 1.16x wall time speedup for this strategy opposed to using LLVM's C++ IR Builder API.

Note that this is not the main purpose of the change, but it is a nice little side benefit.

Edit: Looks like this is not a fair comparison since master branch is doing some redundant work. I'll follow-up if we have any more accurate measurements.

Excellent, thanks for the update. That's indeed very encouraging. The bitcode approach is nice and clean and almost no downsides, as long as the performance is comparable.

Another idea that I got: in Debug mode compilation we do not turn any optimizations in LLVM, and we want as fast compilation as possible. Unfortunately LLVM compiles very slowly (often 20x slower compared to our direct binary backend). However, we (or you!) could write an alternative code generator that takes the bitcode and generates a binary quickly. We currently use the WASM bitcode for that (we have a fast WASM to binary generator), but the advantage of using LLVM bitcode is that we could reuse the same infrastructure as the Release builds (that use LLVM with optimizations on), thus simplifying maintenance.

I don't really see the point of taking a detour through LLVM IR when the point is to compile faster. In zig we skip straight to x86 or other machine code. Introducing a pit stop through LLVM IR would certainly be slower than not doing that.