containerd/cgroups

NewSystemd must not ignore UnitExists

kolyshkin opened this issue · 3 comments

This is the same issue as described in opencontainers/runc#3780

In short, NewSystemd ignores UnitExists error from systemd, as a result the PID is not added to the proper systemd unit and thus the cgroup.

See the above issue for details as to how that may happen.

See the opencontainers/runc#3782 as to how this might be fixed.

I'll take a look at this

I assume that this is the block that you're talking about refactoring? In a nutshell, are you proposing the following:

  1. If you get a unit exists error, reset the systemd unit like here and try again.
  2. If the retry fails, return an error rather than quietly complete.

I'll see if I can throw together a PR. Seems straightforward enough, if my understanding is correct...

I assume that this is the block that you're talking about refactoring? In a nutshell, are you proposing the following:

  1. If you get a unit exists error, reset the systemd unit like here and try again.
  2. If the retry fails, return an error rather than quietly complete.

I'll see if I can throw together a PR. Seems straightforward enough, if my understanding is correct...

@mmerkes yes, your understanding is correct. The most straightforward fix would be to merely return a "unit already exists" error and when fix the callers who do not expect it. The slightly more elaborated one is to call reset-failed (which will remove the failed/non-running unit, if exists) and retry, like I did in runc/libcontainer/cgroups.