kinvolk/kube-spawn

Incorrect assumptions about 'machines.raw'

Opened this issue · 1 comments

Recent commit assumes that /var/lib/machines is loopback mounted onto /var/lib/machines.raw. But many people will have /var/lib/machines as its own btrfs partition:

mount|grep machin
/dev/mapper/btrfs-machines on /var/lib/machines type btrfs (rw,relatime,ssd,space_cache,subvolid=5,subvol=/)

So now it fails with:

Failed to start cluster: stat /var/lib/machines.raw: no such file or directory

instead

I don't think this approach is correct. It assumes this application is the only thing using nspawn, it tries to unmount /var/lib/machines which won't work if other containers are running.

I think this should just be documented. The documented approach is to run:
machinectl set-limit 20G which, if you are using the /var/lib/machines.raw loopback mount method.

From my understanding, that failure is just another regression caused by #265.

Before the refactoring PR, EnlargeStoragePool was only called when the storage pool image /var/lib/machines.raw existed. If the image does not exist, the btrfs storage volume is already there, and there's nothing to resize, so it just moved on to the next step.

After the PR, that's not the case any more. The original check for the existing image has disappered. So kube-spawn tries to unmount the image even if it does not exist. As a result, we see a failure like this.

Adding a simple check for the volume image could resolve this issue.

Writing a document is of course good, but I'm afraid that's not enough. In practice, most users would just start kube-spawn with ext4 on the host filesystem, not btrfs. Then systemd-machined will create the loopback mount automatically, and kube-spawn will end up having issues like #281. Not every user is capable of manually resizing the volume. Thus it is really necessary for kube-spawn to resize the volume like #283.

Sorry, but I'm on vacation right now. So I'm not sure I can work on it in 2 weeks.