icecc/icecream

frequent drop-outs when used over WIFI/VPN

milianw opened this issue · 1 comments

Setup:

  • client: archlinux, icecream 1.3.1-1 from AUR
    /usr/lib/icecream/sbin/iceccd -u icecream -l /var/log/icecream/iceccd --nice 5 -s 192.168.150.185 -n ICECREAM -b /var/cache/icecream
  • server: suse tumbleweed afaik, I can ask my colleague which version it is exactly, if that matters

I'm using icecc from home over WIFI. To make things worse, the compile cluster is in the office, to which I'm connected over VPN. I have relatively fast fiber at home (100MBit), WIFI connection is also relatively good.

Still, in such a configuration icecc is often encountering errors, I believe it is not very resilient to such a setup? It's still faster than compiling without icc on my laptop, but it doesn't seem to be saturating my network up/down link.

Here are some excerpts from the ICECC warnings during a compile job:

no server found 192.168.151.248

ICECC[307384] 2020-11-14 12:20:38: no server found behind given hostname 192.168.151.248:10245
ICECC[307384] 2020-11-14 12:20:38: got exception Error 2 - no server found at 192.168.151.248 (192.168.151.248)

This is quite odd. This seems to be the VPN IP for the machine from which I'm starting the compile job, at least according to icemon. I can ping that address fine too:

$ ping 192.168.151.248 
PING 192.168.151.248 (192.168.151.248) 56(84) bytes of data.
64 bytes from 192.168.151.248: icmp_seq=1 ttl=64 time=6.67 ms
$ dig 192.168.151.248
; <<>> DiG 9.16.8 <<>> 192.168.151.248
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 10640
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 65494
;; QUESTION SECTION:
;192.168.151.248.               IN      A

;; Query time: 30 msec
;; SERVER: 127.0.0.53#53(127.0.0.53)
;; WHEN: Sa Nov 14 12:33:49 CET 2020
;; MSG SIZE  rcvd: 44

So I'm not really clear what this means...

got exception Error 1

This warning is shown very often during compile jobs - up to dozens to hundreds of times, depending on how many translation units I'm compiling:

ICECC[307290] 2020-11-14 12:21:19: got exception Error 1 - expected use_cs reply, but got 0 instead (this should be an exception!)

more breakage

Then I sometimes see blocks like this:

ICECC[307263] 2020-11-14 12:21:19: got exception Error 1 - expected use_cs reply, but got 0 instead (this should be an exception!)
ICECC[307252] 2020-11-14 12:21:21: flush_writebuf() failed(Error: Broken pipe)
ICECC[307252] 2020-11-14 12:21:26: Remote status (compiled on 127.0.0.1): /var/cache/icecream/target=x86_64/75ecaef30c21fdd225cc09e3798f3ffc/usr/bin/as is not executable, installed environment removed?
ICECC[307252] 2020-11-14 12:21:26: got exception Error 23 - Remote status (compiled on 127.0.0.1)
ICECC[307250] 2020-11-14 12:21:20: flush_writebuf() failed(Error: Broken pipe)
ICECC[307250] 2020-11-14 12:21:26: Remote status (compiled on 127.0.0.1): /var/cache/icecream/target=x86_64/75ecaef30c21fdd225cc09e3798f3ffc/usr/bin/as is not executable, installed environment removed?
ICECC[307250] 2020-11-14 12:21:26: got exception Error 23 - Remote status (compiled on 127.0.0.1)
ICECC[307262] 2020-11-14 12:21:27: flush_writebuf() failed(Error: Broken pipe)
ICECC[307262] 2020-11-14 12:21:29: flush_writebuf() failed(Error: Connection reset by peer)
ICECC[307262] 2020-11-14 12:21:29: remote status: /var/cache/icecream/target=x86_64/75ecaef30c21fdd225cc09e3798f3ffc/usr/bin/as is not executable, installed environment removed?
ICECC[307262] 2020-11-14 12:21:29: write of source chunk to host 127.0.0.1
ICECC[307262] 2020-11-14 12:21:29: failed (Error: Connection reset by peer)
ICECC[307262] 2020-11-14 12:21:29: got exception Error 15 - write to host failed (127.0.0.1) 
ICECC[307257] 2020-11-14 12:21:30: flush_writebuf() failed(Error: Broken pipe)
ICECC[307257] 2020-11-14 12:21:30: flush_writebuf() failed(Error: Broken pipe)
ICECC[307257] 2020-11-14 12:21:30: remote status: /var/cache/icecream/target=x86_64/75ecaef30c21fdd225cc09e3798f3ffc/usr/bin/as is not executable, installed environment removed?
ICECC[307257] 2020-11-14 12:21:30: write of source chunk to host 127.0.0.1
ICECC[307257] 2020-11-14 12:21:30: failed (Error: Broken pipe)
ICECC[307257] 2020-11-14 12:21:30: got exception Error 15 - write to host failed (127.0.0.1) 
ICECC[308205] 2020-11-14 12:21:54: no server found behind given hostname 192.168.151.248:10245
ICECC[308205] 2020-11-14 12:21:54: got exception Error 2 - no server found at 192.168.151.248 (192.168.151.248) 
ICECC[308348] 2020-11-14 12:22:06: no server found behind given hostname 192.168.151.248:10245
ICECC[308348] 2020-11-14 12:22:06: got exception Error 2 - no server found at 192.168.151.248 (192.168.151.248) 
ICECC[309005] 2020-11-14 12:22:29: flush_writebuf() failed(Error: Broken pipe)
ICECC[308940] 2020-11-14 12:22:29: flush_writebuf() failed(Error: Broken pipe)

This is super odd too. Again, 127.0.0.1 and 192.168.151.248 should both be the same machine, I don't quite understand why it's sometimes using one or the other. I also don't understand how a write to that host could ever fail?

Furthermore, I notice that on this machine here locally, the folder /var/cache/icecream/target=x86_64 does not exist at all. Should I create it manually? I thought icecc would do that as needed. But for a local job that I start myself, it should just use the local binaries directly instead of jumping through the scheduler, no?

Is icecream maybe getting confused by the multiple IP addresses for my local machine? I.e. the one for the normal WIFI internet connection and the one for the VPN connection?

icecc log messages

My icecc log basically looks like this:

[297382] 2020-11-14 12:21:26: scheduler dead ?!
[297382] 2020-11-14 12:21:28: scheduler dead ?!
[297382] 2020-11-14 12:21:29: scheduler dead ?!
[307821] 2020-11-14 12:21:29: I don't have environment 75ecaef30c21fdd225cc09e3798f3ffc(x86_64) 46
[297382] 2020-11-14 12:21:29: scheduler dead ?!
[297382] 2020-11-14 12:21:29: scheduler dead ?!
[297382] 2020-11-14 12:21:30: scheduler dead ?!
[307826] 2020-11-14 12:21:30: I don't have environment 75ecaef30c21fdd225cc09e3798f3ffc(x86_64) 45
[297382] 2020-11-14 12:21:30: scheduler dead ?!
[297382] 2020-11-14 12:21:30: scheduler dead ?!
...

That last message is repeated many times.

Any input would be appreciated