cncf/demo

Get distcc compiles working

Opened this issue · 6 comments

Specifically:

  • pump mode
  • reliably
  • instrumented

Arch wiki is a good resource as usual:
https://wiki.archlinux.org/index.php/Distcc

The man page:
https://linux.die.net/man/1/distcc

Some common sense. Opting for Debian stable.

Distcc slaves should start first so master can get DISTCC_HOSTS/DISTCC_POTENTIAL_HOSTS as an environment variable in this case, without overly complicated discovery.

Will add what else I'm missing in next comment.

distcc wants an ALLOWEDNETS environment variable to look for peers.

I'm going to cheat and just proceed with 10.244.0.0/16 which is good for my cluster, but this shouldn't be hard coded. Where exactly to pull the cluster subnets from is an interesting aside I'll add to the backlog instead of lingering on this now.

https://lists.samba.org/archive/distcc/2007q4/003593.html

There was a zeroconf patch, it looks like it has been mainlined since it was announced because when even with ZEROCONF="false" I get:

 distcc --show-hosts
distcc[639] (dcc_parse_hosts) Warning: /root/.distcc/zeroconf/hosts contained no hosts; can't distribute work
distcc[639] (dcc_zeroconf_add_hosts) CRITICAL! failed to parse host file.

While generally useful, I'd rather turn off zeroconf as it isn't necessary for our purposes. Looking for a way.

https://ubuntuforums.org/archive/index.php/t-1747376.html

It seems that there's some bug with distcc/avahi that causes this problem.

Take out "+zeroconf" out of the global hosts file (/etc/distcc/hosts) and things should work as expected.

That's correct. Now it looks in: /etc/distcc/hosts as desirable. Onwards.
EDIT: DISTCC_HOSTS also clobbers correctly.

What's left:

Replication Controller to start $(Number of Nodes - 1) slave replicas and a master distcc.
Need to pass the work to the master somehow, easy way out for the short term is just to bake a job script into the image.

Tear down

  • Where to save compile artifacts
  • Can I recycle these pods, might need to throw away after each compile actually. Check this.

For now might not have to care about the tear down. Single shot.

Note on pump mode, apparently this is impossible to use when compiling a Kernel.

It starts off trying to distribute to the salves, then they get a "wrong result" and the master proceeds to compile locally and ignore the slaves. The slaves for some reason keep chugging along, I guess work that is already thrown away.

So the graphs looked right until I noticed the master finished with a slave still chugging along at something, which should be impossible. The only give away is a blink and you miss it note in the distcc log:

distcc[8235] ERROR: compile arch/x86/kernel/asm-offsets.c on 10.244.0.6,cpp,lzo failed
distcc[8235] (dcc_build_somewhere) Warning: remote compilation of 'arch/x86/kernel/asm-offsets.c' failed, retrying locally
distcc[8235] Warning: failed to distribute arch/x86/kernel/asm-offsets.c to 10.244.0.6,cpp,lzo, running locally instead
distcc[8235] (dcc_please_send_email_after_investigation) Warning: remote compilation of 'arch/x86/kernel/asm-offsets.c' failed, retried locally and got a different result.
distcc[8235] (dcc_please_send_email_after_investigation) Warning: file 'include/generated/autoconf.h', a dependency of arch/x86/kernel/asm-offsets.c, changed during the build
distcc[8235] (dcc_note_discrepancy) Warning: now using plain distcc, possibly due to inconsistent file system changes during build

So pump mode would give in theory a nice 30%-50% speed boost but I haven't found anybody who successfully used it with the kernel. If you're reading this and have an insight do get in touch.