ibuildthecloud/systemd-docker

Manual Restart

ap4y opened this issue · 12 comments

ap4y commented

I'm having some issues with manual service restarts via systemctl. For some reason doing systemctl restart on a service with systemd-docker doesn't restart it, instead it transitions to the inactive state. With restart policy always manual restart triggers auto-restart of the service. I don't see anything interesting in journal apart from occasional reports from the process of SIGTERM. Maybe you can help me understand or debug this situation. Thanks!

Can you share your unit file?

ap4y commented

Sure thing. This one of them:

[Unit]
Description=Redis (%i)

After=docker.service
Requires=docker.service

After=redis-data@%i.service
Requires=redis-data@%i.service

[Service]
Type=notify
NotifyAccess=all

Restart=on-failure
RestartSec=10s

EnvironmentFile=/etc/environment

ExecStart=/opt/bin/systemd-docker run --rm --volumes-from redis-data --name %p -p 6379:6379 redis:2.8 redis-server --loglevel warning

[X-Fleet]
X-ConditionMachineMetadata="environment=%i"

[Install]
WantedBy=multi-user.target
ap4y commented

Hey @ibuildthecloud I wonder if you have any updates regarding this issue?

I'm unable to reproduce. I'm running on CoreOS 471.1.0 using the following unit file. This is basically the one you posted but I just removed the dependency on redis-data

[Unit]
Description=Redis (%i)

After=docker.service
Requires=docker.service

[Service]
Type=notify
NotifyAccess=all

Restart=on-failure
RestartSec=10s

EnvironmentFile=/etc/environment

ExecStart=/opt/bin/systemd-docker run --rm --name %p -p 6379:6379 redis:2.8 redis-server --loglevel warning

[X-Fleet]
X-ConditionMachineMetadata="environment=%i"

[Install]
WantedBy=multi-user.target

I wrote that to test.service then did systemctl enable $(pwd)/test.service. After doing systemctl start test, systemctl restart test worked as expected.

ap4y commented

I'm using stable version so I'm a bit behind (444.5.0). Also I'm submitting services with fleet, so I wonder if it's somehow related. I will do some testing without fleet. Thanks.

I tried fleet too, but I did not try with stable.

@ap4y did you try running the exact same units without systemd-docker and just docker run? Just remove

Type=notify
NotifyAccess=all
ap4y commented

Yeah, we recently switched to systemd-docker, previously it worked without issues, we also had several rolling updates after that, so I wonder if it's just a problem with this specific version. I'm doing some tests at the moment, I will let you know what I will find.

ap4y commented

I did some tests, so far:

  1. Restart of the unit with systemd-docker doesn't work, tried with and without fleet;
  2. Restart of the unit with docker works, tried with and without fleet.

So the only suspect is a CoreOS version. I may try to switch one of the machines to the alpha channel.

ap4y commented

I tested with alpha version and I can confirm that it works. Thanks for help.

ap4y commented

I'm having this issues once again on 472.0.0, restarts are not working.

I have similar issue (my test unit file can be found at #15).

If I use it with ExecStartPre=/usr/bin/docker run --rm -v /opt/bin:/opt/bin ibuildthecloud/systemd-docker it restarts via systemctl restart sd-test fine. With same file in /opt/bin/systemd-docker (e. g. received by same docker command as in ExecStartPre) and without that line in unit file it starts and stops fine, but restart seems broken.

Reproducable on my ArchLinux box with systemd 217 and docker 1.3.1/1.15 but not on CoreOS 509.1.0 with systemd 215, docker 1.3.2/1.15.

I have no ideas what is broken. But i see some strange pids in systemctl status/journalctl output:

● sd-test.service - test
   Loaded: loaded (/etc/systemd/system/sd-test.service; disabled)
   Active: active (running) since Wed 2014-11-26 18:59:05 MSK; 5s ago
 Main PID: 24547 (sh)
   CGroup: /system.slice/sd-test.service
           ├─24511 /opt/bin/systemd-docker run --rm --name sd-test.service busybox /bin/sh -c while true ; do date ; sleep 5 ; done
           ├─24547 /bin/sh -c while true ; do date ; sleep 5 ; done
           └─24578 sleep 5

Nov 26 18:59:05 localhost systemd-docker[24511]: 2014/11/26 18:59:05 Moving pid 24547 to /sys/fs/cgroup/cpu,cpuacct/system.slice/sd-test.service/cgroup.procs
Nov 26 18:59:05 localhost systemd-docker[24511]: 2014/11/26 18:59:05 Moving pid 24561 to /sys/fs/cgroup/cpu,cpuacct/system.slice/sd-test.service/cgroup.procs
Nov 26 18:59:05 localhost systemd-docker[24511]: 2014/11/26 18:59:05 Moving pid 24547 to /sys/fs/cgroup/systemd/system.slice/sd-test.service/cgroup.procs
Nov 26 18:59:05 localhost systemd-docker[24511]: 2014/11/26 18:59:05 Moving pid 24561 to /sys/fs/cgroup/systemd/system.slice/sd-test.service/cgroup.procs
Nov 26 18:59:05 localhost systemd-docker[24511]: 2014/11/26 18:59:05 Moving pid 24547 to /sys/fs/cgroup/blkio/system.slice/sd-test.service/cgroup.procs
Nov 26 18:59:05 localhost systemd-docker[24511]: 2014/11/26 18:59:05 Moving pid 24561 to /sys/fs/cgroup/blkio/system.slice/sd-test.service/cgroup.procs
Nov 26 18:59:05 localhost systemd-docker[24511]: 2014/11/26 18:59:05 Moving pid 24547 to /sys/fs/cgroup/freezer/cgroup.procs
Nov 26 18:59:05 localhost systemd-docker[24511]: 2014/11/26 18:59:05 Moving pid 24561 to /sys/fs/cgroup/freezer/cgroup.procs
Nov 26 18:59:05 localhost systemd-docker[24511]: Wed Nov 26 15:59:05 UTC 2014
Nov 26 18:59:10 localhost systemd-docker[24511]: Wed Nov 26 15:59:10 UTC 2014

After restart attempt those lines are added:

Nov 26 18:59:13 localhost systemd[1]: sd-test.service: Supervising process 24547 which is not our child. We'll most likely not notice when it exits.
Nov 26 18:59:13 localhost systemd-docker[24587]: Wed Nov 26 15:59:05 UTC 2014
Nov 26 18:59:13 localhost systemd-docker[24587]: Wed Nov 26 15:59:10 UTC 2014

Looks like some process duplicated output of systemd-docker (which was pid 24511), but it has strange pid not present in logs before.

On CoreOS it looks like:

Nov 26 16:46:15 core-01 systemd[1]: Starting test...
Nov 26 16:46:15 core-01 systemd[1]: sd-test.service: Supervising process 5698 which is not our child. We'll most likely not notice when it exits.
Nov 26 16:46:15 core-01 systemd[1]: Started test.
Nov 26 16:46:15 core-01 systemd-docker[5682]: 2014/11/26 16:46:15 Moving pid 5698 to /sys/fs/cgroup/blkio/system.slice/sd-test.service/cgroup.procs
Nov 26 16:46:15 core-01 systemd-docker[5682]: 2014/11/26 16:46:15 Moving pid 5720 to /sys/fs/cgroup/blkio/system.slice/sd-test.service/cgroup.procs
Nov 26 16:46:15 core-01 systemd-docker[5682]: 2014/11/26 16:46:15 Moving pid 5698 to /sys/fs/cgroup/freezer/cgroup.procs
Nov 26 16:46:15 core-01 systemd-docker[5682]: 2014/11/26 16:46:15 Moving pid 5720 to /sys/fs/cgroup/freezer/cgroup.procs
Nov 26 16:46:15 core-01 systemd-docker[5682]: 2014/11/26 16:46:15 Moving pid 5698 to /sys/fs/cgroup/devices/system.slice/sd-test.service/cgroup.procs
Nov 26 16:46:15 core-01 systemd-docker[5682]: 2014/11/26 16:46:15 Moving pid 5720 to /sys/fs/cgroup/devices/system.slice/sd-test.service/cgroup.procs
Nov 26 16:46:15 core-01 systemd-docker[5682]: 2014/11/26 16:46:15 Moving pid 5698 to /sys/fs/cgroup/memory/system.slice/sd-test.service/cgroup.procs
Nov 26 16:46:15 core-01 systemd-docker[5682]: 2014/11/26 16:46:15 Moving pid 5720 to /sys/fs/cgroup/memory/system.slice/sd-test.service/cgroup.procs
Nov 26 16:46:15 core-01 systemd-docker[5682]: 2014/11/26 16:46:15 Moving pid 5698 to /sys/fs/cgroup/cpu,cpuacct/system.slice/sd-test.service/cgroup.procs
Nov 26 16:46:15 core-01 systemd-docker[5682]: 2014/11/26 16:46:15 Moving pid 5720 to /sys/fs/cgroup/cpu,cpuacct/system.slice/sd-test.service/cgroup.procs
Nov 26 16:46:15 core-01 systemd-docker[5682]: 2014/11/26 16:46:15 Moving pid 5698 to /sys/fs/cgroup/systemd/system.slice/sd-test.service/cgroup.procs
Nov 26 16:46:15 core-01 systemd-docker[5682]: 2014/11/26 16:46:15 Moving pid 5720 to /sys/fs/cgroup/systemd/system.slice/sd-test.service/cgroup.procs
Nov 26 16:46:15 core-01 systemd-docker[5682]: Wed Nov 26 16:46:15 UTC 2014
Nov 26 16:46:20 core-01 systemd-docker[5682]: Wed Nov 26 16:46:20 UTC 2014
Nov 26 16:46:22 core-01 systemd[1]: Stopping test...
Nov 26 16:46:22 core-01 systemd[1]: Starting test...
Nov 26 16:46:22 core-01 systemd[1]: sd-test.service: Supervising process 5752 which is not our child. We'll most likely not notice when it exits.
Nov 26 16:46:22 core-01 systemd[1]: Started test.
Nov 26 16:46:22 core-01 systemd-docker[5735]: 2014/11/26 16:46:22 Moving pid 5752 to /sys/fs/cgroup/blkio/system.slice/sd-test.service/cgroup.procs
Nov 26 16:46:22 core-01 systemd-docker[5735]: 2014/11/26 16:46:22 Moving pid 5752 to /sys/fs/cgroup/freezer/cgroup.procs
Nov 26 16:46:22 core-01 systemd-docker[5735]: 2014/11/26 16:46:22 Moving pid 5752 to /sys/fs/cgroup/devices/system.slice/sd-test.service/cgroup.procs
Nov 26 16:46:22 core-01 systemd-docker[5735]: 2014/11/26 16:46:22 Moving pid 5752 to /sys/fs/cgroup/memory/system.slice/sd-test.service/cgroup.procs
Nov 26 16:46:22 core-01 systemd-docker[5735]: 2014/11/26 16:46:22 Moving pid 5752 to /sys/fs/cgroup/cpu,cpuacct/system.slice/sd-test.service/cgroup.procs
Nov 26 16:46:22 core-01 systemd-docker[5735]: 2014/11/26 16:46:22 Moving pid 5752 to /sys/fs/cgroup/systemd/system.slice/sd-test.service/cgroup.procs
Nov 26 16:46:22 core-01 systemd-docker[5735]: Wed Nov 26 16:46:22 UTC 2014
Nov 26 16:46:27 core-01 systemd-docker[5735]: Wed Nov 26 16:46:27 UTC 2014