ziglang/zig

create a self-hosted incremental linker

andrewrk opened this issue · 18 comments

LLD has no plans to do incremental linking, even though incremental linking is a perfect fit for zig's debug builds. Not good enough. Zig project will have its own linker.

This is a huge project - support has to be explicitly added for every target OS and every target architecture.

Once this project is far enough along, zig will drop its dependency on LLD. No reason to have both.

Isn’t linking about as fast as cat?
If so how is this big effort worth it vs just relinking everything?

It's about 5x slower than cat.

Also I just tried catting a debug build of clang to /dev/null and it took 3 seconds on my SSD. That's not fast enough. If you change a single function and recompile a large project, zig should only have to compile 1 function and update only the bytes of that function in the output file.

Ok as much as I‘d argue against rewrite, this seems to make sense in the long run indeed.
Maybe someone else has started the same effort until 1.1 bekomes a thing. Maybe rust, at least they have function level recompile as well.

There is this Google Project „Gold“ linker which goal is/ was both fast and incremental linking

https://www.airs.com/blog/archives/38

https://github.com/pathscale/binutils/tree/master/gold

Commit dates seme a bit dated but goals should be perfect so at least could be a guide, maybe it even still works, that would be quite something...

Doing a bit more digging it seems gold added incremental linking, but at least for full linking its half as fast as lld, for whatever reason.
Maybe lld is not as slow afterall.

LLD is crazy fast already, but incremental linking could help on huge projects.

Do you mean LLD will explicitely not have incremental linking ?

Maybe rust, at least they have function level recompile as well.

regarding rust, they are thinking about it as well
rust-lang/rust#39915 (comment) incremental is possible on windows already so I don't think its absurd to suggest lld will maybe add this?
(they also talk about gold, does not seem to be completely dead, just not working for them but maybe works for zig incremental linking?)

given that lld is already very fast, supports many/ all? important platforms and zig and rust would have the same demand...

Do you mean LLD will explicitely not have incremental linking ?

Yes, I've asked them about it and others have asked them about it, and they just want a fast but deterministic linker that redoes the whole job every time.

Another motivation for this issue is that the MACH-O code in LLD is poor quality, and nobody in the LLVM community wants to improve it.

See for example https://gist.github.com/srgpqt/61163a279baa4f8d41b01a653c2635bc

A self-hosted linker would fit into the bootstrapping plan (#853), no problem. If we had a self-hosted linker and we dropped the dependency on LLD:

  • stage1 builds with the system C++ compiler and linker. (status quo)
  • stage2 builds with stage1 and the system linker.
  • stage3 builds with stage2

Not that I want to move the goalposts too much but a custom linker means no LTO unless someone writes that as well (super hard!)

Our current plan for LTO is emitting everything into a single .o file and running -O3 on that. That's what stage 1 does. It's really slow, but for a release build that's the trade-off.

Meanwhile in debug builds the plan is to split into as many .o files as would speed up the compilation.

It's possible that there may be a setting for release mode, how much to compromise the optimization in exchange for faster build times and less compilation memory requirements.

The other thing that makes me suspicious is that rust manages to cross compile and they use lld?

And there is also thinLTO which is actually MUCH faster and that would go away as an option as well.

I mean it’s all your project so this is just meant as some thought to avoid unnecessary work but it’s your decision.

reading http://lists.llvm.org/pipermail/llvm-dev/2018-June/123782.html sounds indeed quite discouraging and surprising tbh.

I would love to know how rust accomplishes cross compiling for the MacOS target. It's very unlikely they're using LLD.

It's very unlikely they're using LLD.

I forgot to adjust my comment, adding a reference to the rust thread I already linked to previously.


searching though their issues revealed another problem with creating an own linker
rust-lang/rust#54637

For a project at Google, we need retpoline support.

We should still support retpolines when plt is used. That being said, it seems to me that this is almost entirely a linker’s job.


rust-lang/rust#39915 (comment)

they try to use LLD

Once that's done we can advertise it to the community, asking for feedback. Here we can gain both timing information as well as bug reports to send to LLD. If everything goes smoothly (which is sort of doubtful with a whole brand new linker, but hey you never know!) we can turn it on by default, otherwise we can work to stabilize the selection of LLD and then add an option to Cargo.toml so projects can at least opt-in to it.


about the current status here is a search for lld related commits
https://github.com/rust-lang/rust/search?q=lld&type=Commits

as far as I can tell they have their own lld fork like zig which they call rust-lld
at least for ARM https://www.reddit.com/r/rust/comments/9a7te2/nightly_rust_is_switching_to_use_lld_llvms_new/#bottom-comments

rg -F "linker: Some("
src/librustc_target/spec/riscv32imac_unknown_none_elf.rs
28:            linker: Some("rust-lld".to_string()),

src/librustc_target/spec/windows_base.rs
77:        linker: Some("gcc".to_string()),

src/librustc_target/spec/thumb_base.rs
46:        linker: Some("rust-lld".to_string()),

src/librustc_target/spec/armebv7r_none_eabihf.rs
31:            linker: Some("rust-lld".to_owned()),

src/librustc_target/spec/wasm32_unknown_unknown.rs
54:        linker: Some("rust-lld".to_owned()),

src/librustc_target/spec/msp430_none_elf.rs
34:            linker: Some("msp430-elf-gcc".to_string()),

src/librustc_target/spec/riscv32imc_unknown_none_elf.rs
28:            linker: Some("rust-lld".to_string()),

src/librustc_target/spec/l4re_base.rs
35:        linker: Some("ld".to_string()),

src/librustc_target/spec/aarch64_unknown_none.rs
23:        linker: Some("rust-lld".to_owned()),

src/librustc_target/spec/armv7r_none_eabihf.rs
31:            linker: Some("rust-lld".to_owned()),

src/librustc_target/spec/armv7r_none_eabi.rs
31:            linker: Some("rust-lld".to_owned()),

src/librustc_target/spec/armebv7r_none_eabi.rs
31:            linker: Some("rust-lld".to_owned()),

windows seems different from lld

src/librustc_target/spec/windows_base.rs
77:        linker: Some("gcc".to_string()),

about Mac there is
https://github.com/rust-lang/rust/blob/master/src/librustc_target/spec/i686_apple_darwin.rs#L17


sorry this turned into a bit of a mess

Linkers & Loaders is a brilliant resource. This book helped me go from being clueless, to generally understanding what linkers do enough that I can read linker source code and understand the concepts. Note that the author has recommended that people check out a copy at a library or maybe even buy the book because the online content is outdated and therefore contains errors.

Another motivation for having our own linker has to do with Thread Local Storage (#924). Thread local variables go in the .tdata and .tbss sections, and then the linker merges them together. On Linux, libc looks at the AUXVAL for the PT_TLS Program Header Entry to find out the size of the TLS at runtime. musl-libc, for example, preallocates 16 * pointer_size bytes for TLS but then has to call mmap if it isn't enough:

static struct builtin_tls {
    char c;
    struct pthread pt;
    void *space[16];
} builtin_tls[1];

...

    if (libc.tls_size > sizeof builtin_tls) {
#ifndef SYS_mmap2
#define SYS_mmap2 SYS_mmap
#endif
        mem = (void *)__syscall(
            SYS_mmap2,
            0, libc.tls_size, PROT_READ|PROT_WRITE,
            MAP_ANONYMOUS|MAP_PRIVATE, -1, 0);
        /* -4095...-1 cast to void * will crash on dereference anyway,
         * so don't bloat the init code checking for error codes and
         * explicitly calling a_crash(). */
    } else {
        mem = builtin_tls;
    }

This all happens before main. On the other hand, if we had our own linker, we could have a special placeholder for the statically allocated TLS array, and thus always avoid this mmap before main.

It works differently on Windows. I haven't looked up how it works there yet.

zimmi commented

This exists now: https://github.com/ziglang/zig/blob/master/src-self-hosted/link.zig

It's far from complete and so far only addresses the needs of the self-hosted compiler, and no work has been put into making it link arbitrary objects. But that's the direction it is headed. Bugs & additional features can be separate issues from this one.