fgci-org/ansible-role-cuda

testing issue: ubuntu 1604 and avahi and lxc

Closed this issue · 6 comments

Something is up with avahi on LXC.
https://lists.linuxcontainers.org/pipermail/lxc-users/2016-January/010791.html has some suggestion for how to fix it which is:

# apt-get install avahi-daemon avahi-utils
... bunch of errors ...
# systemctl disable avahi-daemon
# systemctl stop avahi-daemon
# apt-get autoremove
# apt-get install -f avahi-daemon avahi-utils*

I have not tested this yet.

The error ( I think ) from travis:

"Setting up libavahi-core7:amd64 (0.6.32rc+dfsg-1ubuntu2) ...", "Setting up avahi-daemon (0.6.32rc+dfsg-1ubuntu2) ...", "Job for avahi-daemon.service failed because the control process exited with error code. See "systemctl status avahi-daemon.service" and "journalctl -xe" for details.", "invoke-rc.d: initscript avahi-daemon, action "start" failed.", "dpkg: error processing package avahi-daemon (--configure):", " subprocess installed post-installation script returned error exit status 1", "dpkg: dependency problems prevent configuration of avahi-utils:", " avahi-utils depends on avahi-daemon; however:", " Package avahi-daemon is not configured yet.", "", "dpkg: error processing package avahi-utils (--configure):", " dependency problems - leaving unconfigured",

At the end of the install cuda task one sees this from ansible or apt:

"Errors were encountered while processing:", " avahi-daemon", " avahi-utils", " libnss-mdns:amd64"]}
lae commented

FYI I started working on this

lae commented

I kind of did a small hack but I'm not quite sure yet if this was the right solution.

lae@be7f284
https://travis-ci.org/lae/ansible-role-cuda/builds/197455988

I'm doing some more debugging in a different branch.

I should note that I had moved a lot of the container building logic into a separate role to make it more portable across different roles, I hope that doesn't pose an issue for you?

lae commented

To note:

  • tried (possibly amateurishly) to map container uids/gids to different ones on the host, in case the avahi user was conflicting (somehow) with the one on the host
    • doesn't actually seem to be the issue, but it's mentioned in some thread
  • tried creating the avahi user/group before installing avahi as someone mentioned as a solution, but it complained that the user existed

Also, I wasn't able to reproduce this myself on an AWS Ubuntu 14 instance with LXC. Not sure why - so I'm limited to debugging through Travis testing at the moment. I might try to reproduce on GCE later tonight or something (I forgot that's what Travis was using).

No problem with having the lxc-testing in a different role.

Does seem to be relating to the "resources" thing and maybe dbus?

https://travis-ci.org/CSC-IT-Center-for-Science/ansible-role-cuda/jobs/197554042#L1616-L1619

lae commented

Opened #12 with the solution I mentioned last week - performing the install of avahi-daemon and then telling it to not run setrlimit so the daemon would start.

https://loune.net/2011/02/avahi-setrlimit-nproc-and-lxc/

Looks good now. Thanks!