icecc/icecream

Remote linking thoughts

baybal opened this issue · 6 comments

Hello,

As I see, there were a number of bugs where people discussed possibility of remote linking, and talked about its (in)feasibility. I myself often see the linking being the slowest part of the build, especially on huge C++ projects.

I went through the ideas coming, and I see two main major obstacles:

  1. Binaries to be linked have to do a round trip, possibly multiple.
  2. All files to be linked have to be sent to compile servers.

First, can we actually bruteforce the round-trip issue, with modern 1gb/s+ network hardware? Second, can we actually leave the unlinked object files in-place until the need to link them comes?

Even if we have to resend some object files from one node to another, wouldn't that still be much better than O(n) replication of all compiled files? I think, even the later may make sense. Just mirror all files any part of the toolchain touches (including linker) on all nodes in some caching system.

For the later, I think a some hashed table storage may well make sense, so you can even reuse results from previous builds like ccache does.

As I see, there were a number of bugs where people discussed possibility of remote linking, and talked about its (in)feasibility. I myself often see the linking being the slowest part of the build, especially on huge C++ projects.

In that case a good idea is to check the linking itself. Many distributions ship with the BFD ld linker as the default, but LLD or Gold perform way way better. With debug builds, using split dwarf debug also makes a noticeable difference.

First, can we actually bruteforce the round-trip issue, with modern 1gb/s+ network hardware?

You can try. My guess is that if your linking is so huge to take this long, sending it back and forth will take about as long.

I'm already using Gold. The software I am talking about is Webkit.

You can try. My guess is that if your linking is so huge to take this long, sending it back and forth will take about as long.

What I tried is making simultaneous ssh transfer roundtrip with a file of 1mb in a loop to 28 servers. I barely reach a quarter of my throughput.

I'm quite certain that with big enough cluster, remote linking makes sense even if the client if a very beefy PC, especially for C++ projects where linking gets CPU bound even on latest hardware.

@llunak A following idea came to me today. What if the central server can maintain a hash table of "what file compiled where," and have the compile servers to cache intermediary output.

Then, when a compile request arrives, the central server can dispatch a command like "try to fetch file with hash 123ABABAB123132 from another compile server 10.0.2.121"

This will introduce a single roundtrip in between the client, and the compile server in case it can't fetch it (cache eviction, or etc.) But it will be at most a few millisecond loss.

I believe this will also be handy for support of other languages like rust which output a lot of intermediary files.

you're reproducing my thoughts from #504 (and #138). you won't have much luck convincing the maintainers unless you provide a working patch.

@baybal As it turns out, the company I work for - MongoDB - would also like remote linking functionality, so you have convinced at least one maintainer. However, I don't think Icecream can do remote linking because it's structured to be a drop-in compiler replacement, not a general-purpose remote execution daemon. I think we can do something to abstract away remote execution more while working on #172 and then introduce a new front-end command that runs ld remotely.

However, I am also reasonably sure we will encounter more gotchas surrounding remote linking that will make this harder to accomplish than we realize because the system becomes even more involved in the linking process than when compiling, and I don't think we even cover the system effects during compilation well enough right now, either. This will require more thought.

I would also be interested in remote linking actually. I often have 200 distinct binaries, each invoking big linking, and I need to limit parallelism due to amount of memory used. Spreading it to some machines I have for other purposes but still have resources available would be very useful.