gluster-blockd and gluster-block-setup not starting after reboot of node: wrong place for 'systemctl enable...' in update-params.sh ?
Closed this issue · 8 comments
Hi,
we have a strange issue on OpenShift 3.11 with GlusterFS (latest version installed yesterday):
OpenShift installs GlusterFS, glusterfs-storage pods start successfully.
At some point during the installation procedure glusterfs nodes are rebooted. Afterwards the glusterfs-storage pods don't start.
oc logs glusterfs-xxx shows:
Enabling gluster-block service and updating env. variables
Failed to get D-Bus connection: Connection refused
Failed to get D-Bus connection: Connection refused
If I exec into the container and call
systemctl status glusterd
It's enabled and running.
If I do
systemctl status gluster-blockd
or
systemctl status gluster-block-setup
The service is reported as disabled and doesn't run.
If I call
systemctl enable gluster-blockd
systemctl start gluster-blockd
and leave the pod after a few seconds the glusterfs-storage pod is reported as running by the Openshift oc command.
I can get the same result by deleting the glusterfs-storage pod. Thus we have a daemonset in OpenShift the pods restart and are running immediately. The logs dont show any D-Bus errors in this case!
It looks like some weird race condition where the GlusterFS container behaves differently on a reboot as on a pod restart command.
Best regards,
Josef
The services gluster-blockd and gluster-block-setup are both started correctly after a reboot if the commands
systemctl enable gluster-blockd
systemctl enable gluster-block-setup
are called in the Dockerfile and not in the script update-params.sh ! :-)
That already was implemented in a former version of the Docker image but got changed a few days ago.
Could you confirm that, please?
systemctl enable gluster-blockd systemctl enable gluster-block-setup
are called in the Dockerfile and not in the script update-params.sh ! :-)
That already was implemented in a former version of the Docker image but got changed a few days ago.
Yes, the systemctl enable is moved from Dockerfile to update-params.sh recently. This is for introducing a new variable GLUSTER_BLOCK_ENABLED.
By any chance, have you passed GLUSTER_BLOCK_ENABLED=FALSE ?
@SaravanaStorageNetwork
Thanks for your answer. I don't touch the GLUSTER_BLOCK_ENABLED variable. It's set to TRUE because I see the echo message in the IF part of the if ... then ... else. And I see also two D-Bus errors for each of the systemctl commands.
It seems definitely to be the case that D-Bus is not available if the Container starts after a Reboot. If I restart the container manually by killing it, the both systemctl commands are called successfully without D-Bus errors.
I created a new version of the Docker Image with the both systemctl enable commands being in the Dockerfile. With this image the glusterfs-storage containers start successfully also after a reboot.
@SaravanaStorageNetwork
Thanks for your answer. I don't touch the GLUSTER_BLOCK_ENABLED variable. It's set to TRUE because I see the echo message in the IF part of the if ... then ... else. And I see also two D-Bus errors for each of the systemctl commands.It seems definitely to be the case that D-Bus is not available if the Container starts after a Reboot. If I restart the container manually by killing it, the both systemctl commands are called successfully without D-Bus errors.
Thanks for the pointers. This looks little weird, I will get back on this after some debugging.
@SaravanaStorageNetwork
To be more precise: I didn't restart a "GlusterFS-Storage" container but the glusterfs-storage pod.
I get the same issue - after a Node / host reboots the glusterfs container fails it health check
Events:
Type Reason Age From Message
Normal SandboxChanged 39m kubelet, k8snode01 Pod sandbox changed, it will be killed and re-created.
Normal Pulled 39m kubelet, k8snode01 Container image "gluster/gluster-centos:latest" already present on machine
Normal Created 39m kubelet, k8snode01 Created container
Normal Started 38m kubelet, k8snode01 Started container
Warning Unhealthy 38m kubelet, k8snode01 Readiness probe failed: /usr/local/bin/status-probe.sh
failed check: systemctl -q is-active glusterd.service
Warning Unhealthy 38m kubelet, k8snode01 Liveness probe failed: /usr/local/bin/status-probe.sh
failed check: systemctl -q is-active glusterd.service
Warning Unhealthy 34m (x9 over 37m) kubelet, k8snode01 Liveness probe failed: /usr/local/bin/status-probe.sh
failed check: systemctl -q is-active gluster-blockd.service
Warning Unhealthy 3m (x81 over 37m) kubelet, k8snode01 Readiness probe failed: /usr/local/bin/status-probe.sh
failed check: systemctl -q is-active gluster-blockd.service
and looking at the docker logs:
[root@k8snode01 ~]# docker logs 2124ac6f2e00
Enabling gluster-block service and updating env. variables
Failed to get D-Bus connection: Connection refused
Failed to get D-Bus connection: Connection refused
Exec in to the container and it reports it is running:
[root@k8snode01 ~]# docker exec -it ae80e12af2e0 systemctl status glusterd
● glusterd.service - GlusterFS, a clustered file-system server
Loaded: loaded (/usr/lib/systemd/system/glusterd.service; enabled; vendor preset: disabled)
Active: active (running) since Fri 2019-01-11 13:26:06 UTC; 13min ago
Process: 141 ExecStart=/usr/sbin/glusterd -p /var/run/glusterd.pid --log-level $LOG_LEVEL $GLUSTERD_OPTIONS (code=exited, status=0/SUCCESS)
Main PID: 142 (glusterd)
CGroup: /kubepods/burstable/poda9ebe292-14e8-11e9-be34-005056beca32/ae80e12af2e033db097d8e9906128a7e371d3364c0e2bdb40809e03268f5df64/system.slice/glusterd.service
├─142 /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO...
├─180 /usr/sbin/glusterfs -s localhost --volfile-id gluster/gluste...
├─189 /usr/sbin/glusterfsd -s 10.34.88.166 --volfile-id heketidbst...
├─198 /usr/sbin/glusterfsd -s 10.34.88.166 --volfile-id vol_512ee7...
├─206 /usr/sbin/glusterfsd -s 10.34.88.166 --volfile-id vol_55f566...
├─216 /usr/sbin/glusterfsd -s 10.34.88.166 --volfile-id vol_805eb7...
└─225 /usr/sbin/glusterfsd -s 10.34.88.166 --volfile-id vol_f15984...
Jan 11 13:26:03 k8snode01 systemd[1]: Starting GlusterFS, a clustered file-.....
Jan 11 13:26:06 k8snode01 systemd[1]: Started GlusterFS, a clustered file-s...r.
Hint: Some lines were ellipsized, use -l to show in full.
Doing this fixes it (from the node):
[root@k8snode01 ~]# docker exec -it ae80e12af2e0 systemctl enable gluster-blockd
Created symlink from /etc/systemd/system/multi-user.target.wants/gluster-blockd.service to /usr/lib/systemd/system/gluster-blockd.service.
[root@k8snode01 ~]# docker exec -it ae80e12af2e0 systemctl start gluster-blockd
Ack
@jomeier @kesterriley Thanks! I have reverted the PR which introduced the change (of moving systemctl commands from Dockerfile to shellscript).