johnramsden/zectl

While using directory structure proscribed in docs/plugins/systemdboot.md kernels are not preserved for snapshots

GregoryLand opened this issue · 8 comments

First off I want to say directly that I might have misunderstood the instructions and configured something incorrectly so I apologize if that is the case beforehand. I have my system setup according to the doc in plugins/systemdboot.md with a /env directory in my efi partition. I then have the bind mount set to /boot.

Zectl Version: 0.1.1
Problem: Kernel modules in rootfs did not match version number of the kernel.

Today I attempted to roll back my rootfs to a previous snapshot. To do this I performed these steps. While I did not find instructions about how to restore from a snapshot this series of steps made logical sense.

  1. zfs list -r -t snapshot # Done to find snapshot ID
  2. zectl create -e sway@2020-05-08-11:04:59-BeforeNetworkManager swayNoNM # Done to create a bootable version of the snapshot
  3. zectl list # Done to check that everything worked
  4. zectl activate swayNoNM # Switched active image
  5. reboot

This created the rootfs in the expected state but copied the the active 5.4.39-1-lts kernel from /efi/env/org.zectl-sway into /efi/env/org.zectl-swayNoNM. This resulted in a boot with no access to /efi (or much else) because I no longer had any loadable kernel modules in the rootfs. The rootfs only contained kernel modules for 5.4.38 and not 5.4.39.

To recover I used the boot menu to switch back to sway from swayNoNM. I confirmed the issue using diff org.zectl-sway org.zectl-swayNoNM. I then copied the kernels I had in /efi/env/org.zectl-default over the top of /efi/env/org.zectl-swayNoNM.

Workaround: If I create full bootable environments instead of snapshots this would preserve the kernels.

Does zectl preserve the kernels when creating a snapshot and did I somehow break this mechanism with an incorrect setup?

PS: I love this tool so far.

You're correct this appears to be a bug in the creation of a boot environment using an existing snapshot.

Thanks for reporting it, I will get it fixed as soon as possible.

Okay there's a problem I had not considered originally. The issue is with systemd-boot there is no snapshot of the kernel, there is only the directory hierarchy of kernels which is named in a similar way to the boot environment which is how we relate them. In this case there is no way to get the kernel because a snapshot doesn't have a kernel associated with it.

I'm interested in hearing opinions on the matter, I could do the following:

  • Have another flag that specifies another environment who's kernel you should use, if you don't use this option it will use the active environments kernel.
  • specify in the documentation that using an existing snapshot with systemd-boot we'll use the active environments snapshot.
  • if it's an existing environment don't use a kernel at all and the user should be responsible for putting the correct kernel in
  • other option?

Thoughts @GregoryLand , @zhimsel , @mkessler001 ?

So the way I expected it to behave, was for it to backup the kernel of the environment that was being used when the snapshot was created. It did not occur to me that this utility could be used to create a snapshot a non running environment.

My use case looks something like this

  1. zectl snapshot "currently running system"
  2. sudo pacman -Syu

I really only care about the snapshot if something explodes and I will probably delete the snapshot later once a few exist or I am sure everything is ok.

For my usecase it would be ok to create a "backup" of the active environment and mark it as associated with the snapshot. Zipping up the kernel and stashing it someplace to be unzipped if that snapshot ever gets called up. Is there a way to store that kernel as part of the rootfs that just got snapshotted? Something like

  1. zectl snapshot "snapshotname"
    a) tar.gz the existing env active on that rootfs and save it on the root filesystem
    b) create the zfs snapshot
    c) delete the created tar.gz file

  2. zectl create -e "snapshotname" "envToCreateName"
    a) Mount the snapshot somewhere safe
    b) Extract the env archive from the snapshot and use it to create a new env directory
    c) Push the new env folder and create the needed boot configs to make it bootable.

@GregoryLand That's a great idea, it will only work for environments that are explicitly created using zectl snapshot ..., and not for regular ZFS snapshots, but I guess there's nothing we could do how about that other than specify the documentation that the kernel will only be restored if the snapshot has been taken with zectl.

Just to look at expanding to the general case. What if we kept a copy of the ENV on all boot-environment rootfs? So instead of snapshot creating it, the backup always existed? So when a snapshot was made from a non active rootfs it would still have its ENV saved with it? The trouble I see with this would be when to update the backed up env?

Yeah I was thinking about that, the only way it would work would be if the user added some sort of recurring task that would refresh the backup.

If it's just implemented as a known location on disc, there's no reason someone couldn't have a recurring job that updates the location so that when someone tries to create an environment from an existing snapshot it will be used even when it hasn't been created by zectl. If one does not create this recurring job, no big deal and we just don't restore the kernel.

Right now I'm thinking there are a few options.

  • Mandate another dataset where the snapshots are all kept, the user can update them at will
  • Just stick it on the / dataset at a known location
    I'm inclined to use this option just so we don't need to create another dataset, but we would probably need to use a top-level directory just in case the user is mounting other data sets nested under root that are not a part of the snapshot. I'm hesitant to use something like /zectl/env ..., since it would be non-standard, but it might be the only option. Thoughts on this?

Keeping it on / is kind of elegant. It means that everything needed to restore from a snapshot is part of the snapshot itself.

Opened #17 with the outline of the plan. Let me know if you have any thoughts @GregoryLand.