kinvolk/kube-spawn

Starting a cluster fails due to race when mount-/unmounting /var/lib/machines

dongsupark opened this issue · 0 comments

It's basically the same issue as #284, which I closed last week. Unfortunately that issue is still not resolved with systemd v239 and the current master.

What I have observed is like that:

  • When creating & starting a cluster from a clean state, everything works fine. After that, a stopping a cluster works also fine.
  • After having destroyed a cluster, create & start a cluster again. In that case, sometimes the start action does not go on like Failed to start cluster.

When I apply the patch used for #284, it works fine. No such a failure any more. Though I'm not sure I like such a workaround.

I'm aware that it happens because umount /var/lib/machines with -EBUSY. Though I have not been able to figure out exactly which part in systemd caused such an occasional failure. Probably we should figure out an easy way to reproduce the issue.