coreos/coreos-assembler

rpm-ostree in supermin VM leaks rofiles-fuse mounts; prevents clean cache unmount

Closed this issue · 2 comments

In #3844, we saw cosa build fail on the cache umount hitting EBUSY when shutting down the supermin VM:

+ mount -o remount,ro /srv/cache
mount: /srv/cache: mount point is busy.
       dmesg(1) may have more information after failed mount system call.
[  321.026584] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00002000
...

Looking at what could be holding the cache busy, a ps aux shows lots of rofiles-fuse processes:

[2024-08-13T18:18:47.381Z] root       306  0.0  0.1 401092  3972 ?        Ssl  15:46   0:00 rofiles-fuse --copyup usr /tmp/rpmostree-rofiles-fuse0QroLi
[2024-08-13T18:18:47.381Z] root       311  0.0  0.1 251564  2488 ?        Ssl  15:46   0:00 rofiles-fuse --copyup etc /tmp/rpmostree-rofiles-fuseANAAgx
[2024-08-13T18:18:47.381Z] root       330  0.0  0.2 474828  4120 ?        Ssl  15:46   0:00 rofiles-fuse --copyup usr /tmp/rpmostree-rofiles-fuseioSZ7r
[2024-08-13T18:18:47.381Z] root       334  0.0  0.1 251564  2712 ?        Ssl  15:46   0:00 rofiles-fuse --copyup etc /tmp/rpmostree-rofiles-fusePPGBR9
...

This is leftover from the rpm-ostree compose running scriptlets. It should be unmounting them, but clearly something is going wrong. Failures to unmount are logged to the journal, but we don't have a journal in the environment.

Added brutal workaround in #3844 for now, but I'd like to revert that at some point.

Opened coreos/rpm-ostree#5046 to have rpm-ostree log errors to stderr instead.

The easiest would probably just be to take the RPMs spit out from CI in that PR and open a cosa PR that reverts the workaround, and adds the rpm-ostree RPMs to see if we get more information about the error.

Another one we'll need to revert once this is fixed: #3862