/refind-btrfs

Generate rEFInd manual boot stanzas from Btrfs snapshots

Primary LanguagePythonGNU General Public License v3.0GPL-3.0

refind-btrfs

Table of Contents

Description

This tool is used to automate a few tedious tasks required to boot into Btrfs snapshots from rEFInd. It is to rEFInd what grub-btrfs is to GRUB.

What it does is the following:

  • Gathers information about block devices present in the system
  • Identifies the ESP (either by GPT GUID or MBR ID)
  • Gathers information about mounted filesystems (from mtab) which are present on all of the found block devices
  • Identifies the root mount point and gathers information about the subvolume which is mounted at said mount point
  • Searches for snapshots of the identified subvolume in the configured directory (or directories)
  • Searches for rEFInd's main config file on the ESP and parses it to extract manual boot stanzas from it (included configs are also analyzed, if present)
  • Selects the configured number of latest snapshots and uses them as such if they are writable and if any aren't, it either (depending on the configuration):
    • sets their read-only flag to false, thus making them writable
    • creates new writable snapshots from them in the configured location
  • Aligns the root mount point in the fstab file of each selected snapshot with the snapshot itself
  • Deletes outdated previously created writable snapshots (if any exist)
  • Generates new manual boot stanzas from identified ones where every relevant field is aligned with each selected snapshot
  • Finally, it saves the generated manual boot stanzas in separate config files (outputs them to a subdirectory) and includes each file in the main config file so as not to needlessly clutter it

In case a separate /boot partition is detected only the fields relevant to / are modified ("subvol" and/or "subvolid") while the "loader" and "initrd" fields (the former may also be nested within the "options" field) remain unaffected.
It goes without saying that the consequence of having this kind of a setup is being unable to mitigate a problematic kernel upgrade by simply booting into a snapshot.

This tool will also detect a situation where / is mounted as a snapshot (which means that you've already booted into one), issue a warning and simply exit whereas, for instance, Snapper will happily continue creating its snapshots, regardless. This behavior is configurable and enabled by default.

Prerequisites

The following conditions (some are probably superfluous at this point) must be satisfied in order for this tool to function correctly:

  • mounted ESP (no automatic discovery and/or mounting is supported)
  • Btrfs formatted filesystem with a subvolume mounted as /
  • at least one snapshot of the root subvolume
  • rEFInd installation present on the ESP
  • at least one manual boot stanza (found in rEFInd's main config file or in any of the additional config files included within it) defined such that (see the ArchWiki for an example) its own "options" field or any such field belonging to at least one of its sub-menus contains definitions of the following boot loader options:
    • the "root" option must be matched with the root partition (by PARTUUID or PARTLABEL), its filesystem (by UUID or LABEL) or with a block device (by name) which itself represents the root partition
    • the "rootflags" option must define a "subvol" suboption which is matched with the root subvolume's logical path and/or a "subvolid" suboption which is matched with the root subvolume's ID

Installation

This tool is currently available only in the AUR which means that Arch Linux users (as well as users of derivative distributions, I imagine) can easily install it.

It comes with a script (refind-btrfs) which can be used to perform the described steps, on-demand (root privileges are required to run it). There is also a systemd service aptly named refind-btrfs.service which runs the tool in a background mode of operation where the described steps are performed automatically once a change (snapshot creation or deletion) happens in the watched snapshot directories which are the same ones as those in which it searches for snapshots. If you are using Snapper along with its capability to take regular snapshots on boot this service should take these into account as well because it is set to start before Snapper's relevant service does so (the one named snapper-boot.service).
Before running the script for the first time or enabling and starting the service make sure to at least check and perhaps modify the config file (/etc/refind-btrfs.conf) to suit your own needs.

If you wish to check the current status and log output of the running service you can do so by executing:

systemctl status refind-btrfs
journalctl -u refind-btrfs -b

Alternatively, there exists a PyPI package but bear in mind that since libbtrfsutil isn't available on PyPI it needs to be already present in the system site packages (its Python bindings, to be precise) because it cannot be automatically pulled in as a dependency. Chances are that it is available for your distribution of choice (search for a package named "btrfs-progs") but you most probably already have it installed as I suppose you are using Btrfs, after all.
Also, every file contained in this directory should be copied to the following locations:

  • refind-btrfs script to /usr/bin (or wherever it is you keep your system-wide executables)
  • refind-btrfs.conf-sample as refind-btrfs.conf (without the "-sample" suffix) to /etc
  • refind-btrfs.service to /usr/lib/systemd/system (if you are using systemd and wish to utilize the snapshot directory watching feature)

In case the custom generated boot stanza's icon feature (explained in the next section) is desired it can initially be enabled by installing this package with the following command:

pip install refind-btrfs[custom_icon]

You should also create an empty directory named "refind-btrfs" in /var/lib as the tool expects that it is present. Additionally, if you wish to be able to use the Btrfs logo embedding mode of custom icon generation you should also copy the "icons" directory into the previously created one.

Configuration

The configuration file can be found at /etc/refind-btrfs.conf, and each option is thoroughly explained in the sample config file.
In case you've opted to use the provided systemd service and wish to change the search directories (in this context, these are actually watched directories) in the config file while it is running you must restart it manually after doing so because the directory observer is started only once and an automatic restart is not performed.

The default configuration is meant to enable seamless integration with Snapper simply because I'm using it but the tool itself doesn't depend on it and ought to function with different setups. Also, by default the tool is configured for creating new writable snapshots intended for booting instead of in-place modification of the found snapshots' read-only flags as I believe this is the safer (or perhaps even saner) choice.
Timeshift users can try setting the default snapshot search directory to "/run/timeshift/backup/timeshift-btrfs/snapshots" and the corresponding maximum search depth to three.

If you're having trouble with the ESP being automatically located, the "esp_uuid" option could prove to be useful. If an actual UUID is provided (not the default, empty one), this value will be used to compare partition UUIDs (returned by lsblk) instead of comparing their types with hardcoded GPT UUID or MBR ID values.

Custom generated boot stanza icon support is also implemented, by default the source boot stanza's icon is reused. It is possible to provide one's own custom icon's path or to embed the Btrfs logo (comes in two variants and three sizes per each variant) into the source boot stanza's icon instead. This combined icon is then used as the generated boot stanza's icon.
In order for these two additional modes of operation (not the default one) to work an optional dependency has to be installed - namely, the Pillow library which can be installed from the official Arch Linux repository or from PyPI.

It is imperative that you don't just blindly try to boot into a given snapshot (simply because no errors were reported) before verifying the generated manual boot stanza, either by inspecting the file contents in which it was saved or by viewing the boot loader options using rEFInd and also not before verifying the chosen snapshot's fstab file.

Example

Given a setup such as this one:

  • device /dev/nvme0n1 where:
    • the ESP is on /dev/nvme0n1p3 mounted at /efi
    • / is on /dev/nvme0n1p8
    • /boot is included in /dev/nvme0n1p8 (not a separate partition)
  • the subvolume mounted as / is named @
  • fstab file's root mount point:
UUID=95250e8a-5870-45df-a7b3-3b3ee8873c16 / btrfs rw,noatime,compress-force=zstd:2,ssd,space_cache=v2,commit=15,subvolid=256,subvol=/@ 0 0
  • manual boot stanza defined in the refind.conf file (rEFInd's main config file, in this case):
menuentry "Arch Linux - Stable" {
    icon /EFI/refind/icons/os_arch.png
    volume ARCH
    loader /@/boot/vmlinuz-linux
    initrd /@/boot/initramfs-linux.img
    options "root=PARTUUID=048d6fcd-c88c-504d-bd51-dfc0a5bf762d rw add_efi_memmap rootflags=subvol=@ initrd=@\boot\intel-ucode.img"
    submenuentry "Boot - fallback" {
        initrd /@/boot/initramfs-linux-fallback.img
    }
    submenuentry "Boot - terminal" {
        add_options "systemd.unit=multi-user.target"
    }
}
  • five read-only snapshots located in the /.snapshots directory where this directory is itself mounted as a subvolume named @snapper-root (this last bit isn't particularly relevant):

    Absolute Path Time of Creation Subvolume ID
    /.snapshots/1/snapshot 10-12-2020 01:00:00 498
    /.snapshots/2/snapshot 11-12-2020 02:00:00 499
    /.snapshots/3/snapshot 12-12-2020 03:00:00 500
    /.snapshots/4/snapshot 13-12-2020 04:00:00 501
    /.snapshots/5/snapshot 14-12-2020 05:00:00 502
  • refind-btrfs.conf file changed such that the "selection_count" option is set to 3 instead of the default 5

When run, this tool should select the latest three snapshots (3, 4 and 5 from the list) and create new, writable ones from these in the directory configured by the "destination_dir" option where each snapshot is named by formatting the time of creation ("YYYY-mm-dd_HH-MM-SS") of the snapshot it was created from, adding a "rwsnap" prefix to it and also adding the original snapshot's subvolume ID as a suffix. In the rare case when different snapshots have identical timestamps their monotonic numerical IDs are there to ensure uniqueness.

Afterwards, the resultant snapshots' generated names should look like this:

  • rwsnap_2020-12-12_03-00-00_ID500,
  • rwsnap_2020-12-13_04-00-00_ID501 and
  • rwsnap_2020-12-14_05-00-00_ID502

This naming scheme makes sense to me because when choosing a snapshot to boot from you most probably want to know when the original snapshot was created and not the one created from it because the time delay depends on when this tool was run and, if sufficiently large, can completely mislead you. If you've chosen to use the systemd service this delay shouldn't be significant (measuring a mere few seconds at worst, ideally).

The most recent snapshot's fstab file should (after being modified) contain a root mount point which looks like this:

UUID=95250e8a-5870-45df-a7b3-3b3ee8873c16 / btrfs rw,noatime,compress-force=zstd:2,ssd,space_cache=v2,commit=15,subvolid=503,subvol=/@/root/.refind-btrfs/rwsnap_2020-12-14_05-00-00_ID502 0 0

I'm assuming here that the next available subvolume ID was 503 (an increment of one) which implies that the writable snapshot was created immediately after the original snapshot was taken but that doesn't necessarily have to be the case and its specific value doesn't ultimately matter that much as long as it directly corresponds to the newly created snapshot which it absolutely should (otherwise, mounting it as / would fail due to the mismatch).

With this setup the newly created snapshot ended up being nested under the root subvolume but you can of course make your own adjustments as you see fit. This tool will only create the destination directory in case it doesn't exist. It won't do anything other than that.
I've personally created another subvolume named @rw-snapshots directly under the default filesystem subvolume (ID 5) and mounted it at /root/.refind-btrfs. In my case the logical path of rwsnap_2020-12-14_05-00-00_ID502 would be /@rw-snapshots/rwsnap_2020-12-14_05-00-00_ID502.

A generated manual boot stanza's filename is formatted like "{volume}_{loader}.conf" and converted to all lowercase letters which would result in, for this example, a file named "arch_vmlinuz-linux.conf". This file is then saved in a subdirectory (relative to rEFInd's root directory) named "btrfs-snapshot-stanzas" and finally included in the main config file by appending an "include" directive which would, again for this example, look like this: "include btrfs-snapshot-stanzas/arch_vmlinuz-linux.conf". This last step is performed only once, during an initial run. Afterwards, it is detected as already being included in the main config file.

You are free to rearrange the appended include directives however you want, this tool does not care about where exactly they appear in the main config file. This is particularly useful in case you've defined multiple boot stanzas (each one pointing to a different kernel image, for example) and wish to alter the order of the boot menu entries.

The generated file's contents (representing the generated stanza) should look like this:

menuentry "Arch Linux - Stable (rwsnap_2020-12-14_05-00-00_ID502)" {
    icon /EFI/refind/icons/os_arch.png
    volume ARCH
    loader /@/root/.refind-btrfs/rwsnap_2020-12-14_05-00-00_ID502/boot/vmlinuz-linux
    initrd /@/root/.refind-btrfs/rwsnap_2020-12-14_05-00-00_ID502/boot/initramfs-linux.img
    options "root=PARTUUID=048d6fcd-c88c-504d-bd51-dfc0a5bf762d rw add_efi_memmap rootflags=subvol=@/root/.refind-btrfs/rwsnap_2020-12-14_05-00-00_ID502 initrd=@\root\.refind-btrfs\rwsnap_2020-12-14_05-00-00_ID502\boot\intel-ucode.img"
    submenuentry "Arch Linux - Stable (rwsnap_2020-12-13_04-00-00_ID501)" {
        loader /@/root/.refind-btrfs/rwsnap_2020-12-13_04-00-00_ID501/boot/vmlinuz-linux
        initrd /@/root/.refind-btrfs/rwsnap_2020-12-13_04-00-00_ID501/boot/initramfs-linux.img
        options "root=PARTUUID=048d6fcd-c88c-504d-bd51-dfc0a5bf762d rw add_efi_memmap rootflags=subvol=@/root/.refind-btrfs/rwsnap_2020-12-13_04-00-00_ID501 initrd=@\root\.refind-btrfs\rwsnap_2020-12-13_04-00-00_ID501\boot\intel-ucode.img"
    }
    submenuentry "Arch Linux - Stable (rwsnap_2020-12-12_03-00-00_ID500)" {
        loader /@/root/.refind-btrfs/rwsnap_2020-12-12_03-00-00_ID500/boot/vmlinuz-linux
        initrd /@/root/.refind-btrfs/rwsnap_2020-12-12_03-00-00_ID500/boot/initramfs-linux.img
        options "root=PARTUUID=048d6fcd-c88c-504d-bd51-dfc0a5bf762d rw add_efi_memmap rootflags=subvol=@/root/.refind-btrfs/rwsnap_2020-12-12_03-00-00_ID500 initrd=@\root\.refind-btrfs\rwsnap_2020-12-12_03-00-00_ID500\boot\intel-ucode.img"
    }
}

As you've probably noticed, this tool leverages rEFInd's overriding features, that is to say "submenuentry" sections are used to incorporate successive snapshots into the stanza itself by overriding the "loader", "initrd" and "options" fields of the main boot stanza which itself represents the latest snapshot.

If you've configured this tool to also take into account the original boot stanza's sub-menus the resultant generated boot stanza should look like this:

menuentry "Arch Linux - Stable (rwsnap_2020-12-14_05-00-00_ID502)" {
    icon /EFI/refind/icons/os_arch.png
    volume ARCH
    loader /@/root/.refind-btrfs/rwsnap_2020-12-14_05-00-00_ID502/boot/vmlinuz-linux
    initrd /@/root/.refind-btrfs/rwsnap_2020-12-14_05-00-00_ID502/boot/initramfs-linux.img
    options "root=PARTUUID=048d6fcd-c88c-504d-bd51-dfc0a5bf762d rw add_efi_memmap rootflags=subvol=@/root/.refind-btrfs/rwsnap_2020-12-14_05-00-00_ID502 initrd=@\root\.refind-btrfs\rwsnap_2020-12-14_05-00-00_ID502\boot\intel-ucode.img"
    submenuentry "Boot - fallback (rwsnap_2020-12-14_05-00-00_ID502)" {
        initrd /@/root/.refind-btrfs/rwsnap_2020-12-14_05-00-00_ID502/boot/initramfs-linux-fallback.img
    }
    submenuentry "Boot - terminal (rwsnap_2020-12-14_05-00-00_ID502)" {
        add_options "systemd.unit=multi-user.target"
    }
    submenuentry "Arch Linux - Stable (rwsnap_2020-12-13_04-00-00_ID501)" {
        loader /@/root/.refind-btrfs/rwsnap_2020-12-13_04-00-00_ID501/boot/vmlinuz-linux
        initrd /@/root/.refind-btrfs/rwsnap_2020-12-13_04-00-00_ID501/boot/initramfs-linux.img
        options "root=PARTUUID=048d6fcd-c88c-504d-bd51-dfc0a5bf762d rw add_efi_memmap rootflags=subvol=@/root/.refind-btrfs/rwsnap_2020-12-13_04-00-00_ID501 initrd=@\root\.refind-btrfs\rwsnap_2020-12-13_04-00-00_ID501\boot\intel-ucode.img"
    }
    submenuentry "Boot - fallback (rwsnap_2020-12-13_04-00-00_ID501)" {
        loader /@/root/.refind-btrfs/rwsnap_2020-12-13_04-00-00_ID501/boot/vmlinuz-linux
        initrd /@/root/.refind-btrfs/rwsnap_2020-12-13_04-00-00_ID501/boot/initramfs-linux-fallback.img
        options "root=PARTUUID=048d6fcd-c88c-504d-bd51-dfc0a5bf762d rw add_efi_memmap rootflags=subvol=@/root/.refind-btrfs/rwsnap_2020-12-13_04-00-00_ID501 initrd=@\root\.refind-btrfs\rwsnap_2020-12-13_04-00-00_ID01\boot\intel-ucode.img"
    }
    submenuentry "Boot - terminal (rwsnap_2020-12-13_04-00-00_ID501)" {
        loader /@/root/.refind-btrfs/rwsnap_2020-12-13_04-00-00_ID501/boot/vmlinuz-linux
        initrd /@/root/.refind-btrfs/rwsnap_2020-12-13_04-00-00_ID501/boot/initramfs-linux.img
        options "root=PARTUUID=048d6fcd-c88c-504d-bd51-dfc0a5bf762d rw add_efi_memmap rootflags=subvol=@/root/.refind-btrfs/rwsnap_2020-12-13_04-00-00_ID501 initrd=@\root\.refind-btrfs\rwsnap_2020-12-13_04-00-00_ID501\boot\intel-ucode.img systemd.unit=multi-user.target"
    }
    submenuentry "Arch Linux - Stable (rwsnap_2020-12-12_03-00-00_ID500)" {
        loader /@/root/.refind-btrfs/rwsnap_2020-12-12_03-00-00_ID500/boot/vmlinuz-linux
        initrd /@/root/.refind-btrfs/rwsnap_2020-12-12_03-00-00_ID500/boot/initramfs-linux.img
        options "root=PARTUUID=048d6fcd-c88c-504d-bd51-dfc0a5bf762d rw add_efi_memmap rootflags=subvol=@/root/.refind-btrfs/rwsnap_2020-12-12_03-00-00_ID500 initrd=@\root\.refind-btrfs\rwsnap_2020-12-12_03-00-00_ID500\boot\intel-ucode.img"
    }
    submenuentry "Boot - fallback (rwsnap_2020-12-12_03-00-00_ID500)" {
        loader /@/root/.refind-btrfs/rwsnap_2020-12-12_03-00-00_ID500/boot/vmlinuz-linux
        initrd /@/root/.refind-btrfs/rwsnap_2020-12-12_03-00-00_ID500/boot/initramfs-linux-fallback.img
        options "root=PARTUUID=048d6fcd-c88c-504d-bd51-dfc0a5bf762d rw add_efi_memmap rootflags=subvol=@/root/.refind-btrfs/rwsnap_2020-12-12_03-00-00_ID500 initrd=@\root\.refind-btrfs\rwsnap_2020-12-12_03-00-00_ID500\boot\intel-ucode.img"
    }
    submenuentry "Boot - terminal (rwsnap_2020-12-12_03-00-00_ID500)" {
        loader /@/root/.refind-btrfs/rwsnap_2020-12-12_03-00-00_ID500/boot/vmlinuz-linux
        initrd /@/root/.refind-btrfs/rwsnap_2020-12-12_03-00-00_ID500/boot/initramfs-linux.img
        options "root=PARTUUID=048d6fcd-c88c-504d-bd51-dfc0a5bf762d rw add_efi_memmap rootflags=subvol=@/root/.refind-btrfs/rwsnap_2020-12-12_03-00-00_ID500 initrd=@\root\.refind-btrfs\rwsnap_2020-12-12_03-00-00_ID500\boot\intel-ucode.img systemd.unit=multi-user.target"
    }
}

A couple of notable details are the fact that the "add_options" field (if it exists) of any given sub-menu belonging to a successive snapshot is merged with the "options" field of the corresponding snapshot's sub-menu and also the fact that the latest snapshot's sub-menus implicitly inherit those main stanza's fields which they themselves do not override in the original boot stanza. Consequently, these sub-menus' definitions are intentionally similar to those of their counterparts found in the original boot stanza.

This is how an Arch Linux installation with three different kernels (XanMod, Stable and LTS) should appear in rEFInd (the default theme is shown) after this tool has successfully completed its job:

rEFInd Screenshot Default

Here, each manual boot stanza uses its own custom icon based on the default Arch Linux OS icon. The Btrfs logo is then also embedded into these icons (by setting this option to "embed_btrfs_logo") and the resultant icons are defined as part of their corresponding generated boot stanzas.

By using a darker theme (such as the Nord theme - shown in the following screenshot) and by using the "inverted" Btrfs logo's variant (as opposed to the "original" one, shown in the previous screenshot), the same Arch Linux installation should appear in rEFInd looking like this:

rEFInd Screenshot Nord

Implementation

Most relevant dependencies:

  • block device and ESP information is gathered using lsblk (supports JSON output)
  • mtab information is gathered using findmnt (same remark applies regarding the output)
  • all of the mentioned subvolume and snapshot operations are performed using libbtrfsutil
  • ANLTR4 was used to generate the lexer and parser required for rEFInd config files' analyses
  • Watchdog is used for the snapshot directory watching feature and is utilized in a non-recursive fashion (watches all of the configured search directories as well as directories nested under these, up to configured maximum depth reduced by one)
  • python-systemd is used for notifying systemd about the service's readiness (because its type is set to "notify") and also for logging to the journal

Shelve is used to keep track of the currently processed snapshots and also to avoid analyzing the rEFInd config file each time as it is quite an expensive task. A new analysis is performed in case the current and actual times of modification differ (st_mtime is used for that purpose) which means that simply touching the file should also trigger a new analysis (file hashes aren't computed nor consequently compared). This fact also explains the need for a directory in /var/lib as the database file resides in it.

The directory watching mechanism is a bit unfortunate in a sense that it is way overkill for the task at hand. Even though Watchdog is a great, battle-tested library and many people use it, I feel that this solution isn't particularly well suited to this tool but it will simply have to suffice for now as I don't have a better idea (grub-btrfs also relies on a similar mechanism), at least not until the Btrfs authors develop this useful feature or something akin to it.

Further Efforts

Currently, this tool won't clean up after itself in case, for instance, creating writable snapshots succeeds but generating a manual boot stanza from them fails (for whatever reason). The correct thing to do would be to delete these snapshots altogether (thus undoing the changes made by the previous step or roll-backing as it is often called) meaning that the whole run is considered to be successful if and only if all of the steps it performed were successful.
This behavior would then be comparable with the atomicity principle to which most database systems adhere. The previously mentioned scenario is covered in a different way by issuing a relevant warning on the next attempt to run the tool (because the writable snapshots already exist at this point in time and they aren't expected to) but also continuing to perform successive steps. This isn't a general solution, of course, but more of a workaround for this one possible scenario.
With that said, being somehow able to preview changes proposed by this tool would also be beneficial, especially after altering its configuration.

A more elaborate snapshot selection mechanism would be appreciated, comparable to what Snapper does, that is selecting a configurable number of daily, weekly, etc. snapshots to be included in the generated manual boot stanza.

Generated boot stanzas' names are initialized using a hardcoded format string which is not ideal. It would be more convenient to provide a way for users to define their own format string using a combination of predefined variables (time of the source snapshot's creation, its numerical ID, etc.) along with some entirely arbitrary parts.

But, before trying to implement any of these shiny features this project's source code should be properly documented and tests should be written for it because, presently, there aren't any. The latter is also a pretty considerable effort due to the sheer number of different test cases. Luckily, all of the external dependencies (OS commands, third-party library calls and similar) are abstracted away which means that no significant preparatory steps regarding the codebase to be tested are required beforehand.