void-linux/void-runit

network shares hang on restart / shutdown - consequence: hard shutdown required

rjl6789 opened this issue ยท 14 comments

Hi,

If I have mounted network shares (nfs, nfs4, cifs), these are unable to be unmounted / re-mounted as read during shutdown / restart, consequently the shutdown process hangs and I need to do a forced shutdown. The reason is that in /etc/runit/3 the network services are taken offline before before the drives are unmounted / re-mounted as read occurs.

I've created my own /etc/rc.pre-shutdown script (below) that is called by /etc/runit/3 before the services are closed that unmounts / re-mounts as read the network shares. This works.

Apologies if I've missed an obvious option / setting that renders my "fix" not needed, I'm just an enthusiastic Linux user, as opposed to developer.

Hope this is useful for someone.

Rob

echo "   trying to unmount network shares..."

umount -a -r -t nfs,nfs4,cifs

if grep -qs -e 'nfs ' -e 'nfs4 ' -e 'cifs ' /proc/mounts; then
   echo
   echo "...... failed to unmount network some network shares ......"
   rem_shares=$(grep -e 'nfs ' -e 'nfs4 ' -e 'cifs ' /proc/mounts)
   if echo $rem_shares | grep -qs 'rw,'; then
         echo "failed to mount read only as well"
         echo "will attempt forced, lazy unmount"
         echo "if this hangs then will need to hard shutdown..."
         umount -a -f -l -t nfs,nfs4,cifs
   elif echo $rem_shares | grep -qs 'ro,'; then
         echo "shares were mounted read only instead."
         echo "hopefully this is ok. If hangs...hard shutdown required"
   else
         echo "I should not be here..."
   fi
   echo
fi

I wonder if anybody uses / with nfs

Most init systems are able to pivot_root(2) and chroot(2) back into the initramfs and then execute halt/reboot/poweroff from within the initramfs.
In dracuts case the script /usr/lib/dracut/modules.d/99shutdown/shutdown.sh is what would be executed, this then kills remaining processes, unmounts the root filesystem and then finally forces to halt/reboot/poweroff the system when everything is cleanly unmounted.

I'm not sure if this would be possible with runit, runit(8) which executes the stages will call reboot(2) after stage 3 exits.
Maybe its possible to pivot_root(2) and chroot(2) from stage 3, but this seems to be a major hack and could lead to other unexpected issues where the initramfs shutdown would kill itself because the the parent process runit(8) is still there with the old root filesystem and then result in a kernel panic because of killing pid 1.

Maybe the fix should be within runit, to enable chrooting back into the initramfs?

Not sure if the reporter even has / on NFS ... perhaps that should be a second issue.

From what I understand, the hang is here:

void-runit/3

Line 46 in 0566391

umount -r -a -t nosysfs,noproc,nodevtmpfs,notmpfs

Even if stage 3 went through a chroot back into the initramfs, that is orthogonal to the problem reported. Networking services would still be down long before that point is reached.

One way to solve the issue with NFS shares is to have two runsvdir instances. The first would be run after existing runit core-services are started, and would supervise networking services. Then certain tasks related to mounting network shares would be run, then the rest of the services would be run. At shutdown the non-network services would be brought down, then the network shares unmounted, then the network services. The idea could use some more brainstorming.

regarding LVM/LUKS... @bougyman why was the deactivation added in anyway? Is it necessary to deactivate the VGs and crypt devices at boot?

Edit: pinged bougy because of this commit, if that was unclear: b935de9

/etc/rc.pre-shutdown (or /etc/rc.shutdown) don't work around the problem for my NFS mount on shutdown. Has anything changed?

/etc/rc.pre-shutdown (or /etc/rc.shutdown) don't work around the problem for my NFS mount on shutdown. Has anything changed?

@rjl6789 provided script works well for me. You need to hook the pre-shutdown script manually into runit3 before services are stopped.

Here's the beggining of my /etc/runit/3 file

#!/bin/sh
# vim: set ts=4 sw=4 et:

PATH=/usr/bin:/usr/sbin

. /etc/runit/functions
detect_virt
[ -r /etc/rc.conf ] && . /etc/rc.conf

if [ -e /run/runit/reboot ]; then
    chmod 100 /run/runit/reboot
fi

echo
msg ""
[ -x /etc/rc.pre-shutdown ] && msg "Starting pre-shutdown hook" && /etc/rc.pre-shutdown

msg "Waiting for services to stop..."
sv force-stop /var/service/*
sv exit /var/service/*

[ -x /etc/rc.shutdown ] && /etc/rc.shutdown

I have also added sync; at the beginning of the pre-shutdown script and a final lazy umount in the regular shutdown script:

# NFS unmount
# make final lazy attempt to unmount any nfs that might cause interrupts
umount -flat nfs,nfs4,cifs

In tandem these have worked well so far to avoid the interrupts.

@D-Nice thanks for the clarification, I was just able to reboot with this workaround now. Will it break again on a package update?

If you use the xbps preserve config option, then it should be fine. /etc/runit/3 is NOT a conffile, so it will be overwritten if you do not use the preserve option.

@svenper the system config file in question is in /usr/share/xbps.d/xbps.conf by default. I didn't see it documented anywhere in the wikis, but it was in the xbps-install manual under FILES.

@CameronNemo how often, or when do the runit scripts ever get changed, and if the preserve option is used, are we notified of a conflict?

files in /usr/share will always be overwritten on updates, configuration files for xbps from users go into /etc/xbps.d see man xbps.d.

Is there a way to use /etc/rc.conf or similar to acheive an equivalent result as this patch of /etc/runit/3? The way I understand it right now, the options are either, not updating runit, or manually patching after every update.

[ -x /etc/rc.pre-shutdown ] && msg "Starting pre-shutdown hook" && /etc/rc.pre-shutdown

@svenper best path forward IMO is to propose a patch to this repository with that line in it.

In the meantime hold the runit package and manually reapply the patch on the quite infrequent updates.