System D /run/systemd/private
Arlion opened this issue · 3 comments
After extensive troubleshooting I am at an inpass and hoping there is enough information here.
Symptoms:
During startup, gmond will attempt to start but would fail. I later discovered the service was pausing for 3 minutes while it waits for a port to open, and then pauses for another 2 minutes.
After startup completes, starting the service completes.
Troubleshooting:
- Looking at the logs
- -What was weird that systemd was hanging in such a way that journalctl was not producing any output. A regular systemctl issued at the same time as the "systemctl start gmond.service" would show that the service was dead.
- Modifying systemD unit to include additional debug:
/lib/systemd/system/gmond.service
[Unit]
Description=Ganglia Monitoring Daemon
After=multi-user.target
[Service]
Type=notify
ExecStart=/usr/sbin/gmond
Environment=SYSTEMD_LOG_LEVEL=debug
Requires=dbus.service ## added to ensure dbus service was up before gmond started.
[Install]
WantedBy=multi-user.target
Does not produce any additional logs (which was still, none)
I finally wrote a script to hook strace to the process on startup and here it is.
http://paste.fedoraproject.org/423402/47325873/
Here are a few excerts:
10:07:19 connect(3, {sa_family=AF_LOCAL, sun_path="/run/systemd/private"}, 22) = 0
10:07:19 getsockopt(3, SOL_SOCKET, SO_PEERCRED, {pid=1, uid=0, gid=0}, [12]) = 0
10:07:19 getsockopt(3, SOL_SOCKET, SO_PEERSEC, 0x7f8947fef810, 0x7ffd40deba50) = -1 ENOPROTOOPT (Protocol not available)
10:07:19 fstat(3, {st_mode=S_IFSOCK|0777, st_size=0, ...}) = 0
10:07:19 recvmsg(3, 0x7ffd40dea8a0, MSG_DONTWAIT|MSG_NOSIGNAL|MSG_CMSG_CLOEXEC) = -1 EAGAIN (Resource temporarily unavailable)
10:07:19 ppoll([{fd=3, events=POLLIN}], 1, {24, 999975000}, NULL, 8) = 1 ([{fd=3, revents=POLLIN}], left {24, 999930711})
10:07:19 recvmsg(3, {msg_name(0)=NULL, msg_iov(1)=[{"l\2\1\1\10\0\0\0\6\0\0\0\17\0\0\0\5\1u\0\3\0\0\0", 24}], msg_controllen=32, {cmsg_len=28, cmsg_level=SOL_SOCKET, cmsg_type=SCM_CREDENTIALS{pid=1, uid=0, gid=0}}, msg_flags=MSG_CMSG_CLOEXEC}, MSG_DONTWAIT|MSG_NOSIGNAL|MSG_CMSG_CLOEXEC) = 24
10:07:19 recvmsg(3, {msg_name(0)=NULL, msg_iov(1)=[{"\10\1g\0\1v\0\0\1b\0\0\0\0\0\0", 16}], msg_controllen=32, {cmsg_len=28, cmsg_level=SOL_SOCKET, cmsg_type=SCM_CREDENTIALS{pid=1, uid=0, gid=0}}, msg_flags=MSG_CMSG_CLOEXEC}, MSG_DONTWAIT|MSG_NOSIGNAL|MSG_CMSG_CLOEXEC) = 16
10:07:19 recvmsg(3, 0x7ffd40dea950, MSG_DONTWAIT|MSG_NOSIGNAL|MSG_CMSG_CLOEXEC) = -1 EAGAIN (Resource temporarily unavailable)
10:07:19 ppoll([{fd=3, events=POLLIN}], 1, NULL, NULL, 8) = 1 ([{fd=3, revents=POLLIN}])
10:10:01 recvmsg(3, {msg_name(0)=NULL, msg_iov(1)=[{"l\4\1\1K\0\0\0\7\0\0\0p\0\0\0\1\1o\0\31\0\0\0", 24}], msg_controllen=32, {cmsg_len=28, cmsg_level=SOL_SOCKET, cmsg_type=SCM_CREDENTIALS{pid=1, uid=0, gid=0}}, msg_flags=MSG_CMSG_CLOEXEC}, MSG_DONTWAIT|MSG_NOSIGNAL|MSG_CMSG_CLOEXEC) = 24
10:10:01 recvmsg(3, {msg_name(0)=NULL, msg_iov(1)=[{"/org/freedesktop/systemd1\0\0\0\0\0\0\0"..., 179}], msg_controllen=32, {cmsg_len=28, cmsg_level=SOL_SOCKET, cmsg_type=SCM_CREDENTIALS{pid=1, uid=0, gid=0}}, msg_flags=MSG_CMSG_CLOEXEC}, MSG_DONTWAIT|MSG_NOSIGNAL|MSG_CMSG_CLOEXEC) = 179
10:10:01 recvmsg(3, {msg_name(0)=NULL, msg_iov(1)=[{"l\4\1\1@\0\0\0\10\0\0\0q\0\0\0\1\1o\0\31\0\0\0", 24}], msg_controllen=32, {cmsg_len=28, cmsg_level=SOL_SOCKET, cmsg_type=SCM_CREDENTIALS{pid=1, uid=0, gid=0}}, msg_flags=MSG_CMSG_CLOEXEC}, MSG_DONTWAIT|MSG_NOSIGNAL|MSG_CMSG_CLOEXEC) = 24
Finally the service continues and then pauses again for another two minutes. The link above contains all the logs undedited.
Server details:
CentOS 7.2
Fully up to date
ganglia.x86_64 3.7.2-2.el7 @epel/7
ganglia-gmond.x86_64 3.7.2-2.el7 @epel/7
ganglia-gmond-python.x86_64 3.7.2-2.el7 @epel/7
systemd.x86_64 219-19.el7_2.12 @updates/7
systemd-libs.x86_64 219-19.el7_2.12 @updates/7
systemd-sysv.x86_64 219-19.el7_2.12 @updates/7
dbus.x86_64 1:1.6.12-14.el7_2 @updates/7
dbus-glib.x86_64 0.100-7.el7 @anaconda/7
dbus-libs.x86_64 1:1.6.12-14.el7_2 @updates/7
dbus-python.x86_64 1.1.1-9.el7 @anaconda/7
ls -al /run/systemd/private
srwxrwxrwx 1 root root 0 Sep 7 13:49 /run/systemd/private
Thank you for your time.
I am wondering whether
After=multi-user.target
should be changed to
After=network-online.target