Antynea/grub-btrfs

Services fail when booting into snapshot

dhasial opened this issue ยท 50 comments

I think this may be related to #64.
Debian 10 Stable

Whenever I boot into a snapshot from GRUB, a bunch of services don't boot and I end up in a TTY console instead of the GUI like normal. My /boot/efi is on a separate partition, and is not a BTRFS subvol, I wonder if this is related?

My /boot/efi is on a separate partition, and is not a BTRFS subvol, I wonder if this is related?

Definitely not, this is very normal.

I can't think of any specific recommendation for what to investigate, try to see get to the root cause of why services fail, check the status, journal, maybe you'll spot something?

I probably have same problem when trying to boot snapper snapshots. My os is Manjaro
photo_2020-07-30_21-09-14

The snapshot you are trying to boot into is probably set as read-only.

You can check it with:

$ sudo btrfs property get -ts /path/to/snapshot/

If it returns ro=true, you have to make it writable with:

$ sudo btrfs property set -ts /path/to/snapshot/ ro false

While that might be the case, I am able to boot into readonly snapshots just fine, and there are cases where you want to keep them readonly (e.g. this snapshot is your only backup and you can't risk destroying it while attempting to recover your system).

Quote from #88 (comment):

They are r/o for a reason: Booting into a snapshot changes the data and might just destroy the last available backup.
Sometimes, one needs several tries until one achieves to fix a problem, so there should be an option to start over.

I have the following layout, don't know if it can help

[...]
UUID=xxx /              btrfs   subvol=@,defaults,noatime,space_cache,ssd,compress=zstd,commit=120 0 1
UUID=xxx /home          btrfs subvol=@home,defaults,noatime,space_cache,ssd,compress=zstd,commit=120 0 2
UUID=xxx /.snapshots    btrfs   subvol=@snapshots,defaults,noatime,space_cache,ssd,compress=zstd,commit=120 0 2
[...]

FWIW, from my experience one caveat preventing you from booting r/o snapshots can be systemd trying to write the journal onto the r/o file system. You could try to mitigate this by setting storage=volatile or storage=none in /etc/systemd/journald.conf or /etc/systemd/journald.conf.d/*.conf.

Be aware though that you won't have any persistent logs (or even none at all, depending on the setting you chose) then.
I ended up doing double snapshots each time via pacman hooks - the default r/o one, and a writeable snapshot of the same subvolume which I can boot into if I need to.

More info regarding the journal storage in the manual - journald.conf(5).

I solved issue by moving /var to separate subolume

UUID=xxx /                     btrfs   subvol=@,defaults,noatime,space_cache,ssd,compress=zstd,commit=120 0 1
UUID=xxx /home            btrfs   subvol=@home,defaults,noatime,space_cache,ssd,compress=zstd,commit=120 0 2
UUID=xxx /.snapshots   btrfs   subvol=@snapshots,defaults,noatime,space_cache,ssd,compress=zstd,commit=120 0 2
UUID=xxx /var                btrfs   subvol=@var,defaults,noatime,space_cache,ssd,compress=zstd,commit=120 0 2

This was really a nice catch, I completely forgot my I also have /var/log on a separate subvolume, so that's why I am able to boot without any problems on a readonly snapshot.

Thanks guys, I will document this in README!

@maximbaz Can you still boot into your read-only snapshots? If yes, is your var on a separate subvolume?

This is my layout:
@
@/boot/grub/x86_64-efi
@/var/cache
@/var/tmp
@/var/lib/machines
@/var/lib/portables
@/var/lib/libvirt/images
@home
@var-log

Now the system hangs at boot and I get this:

Oct 07 17:33:48 vagrant-btrfs-restore systemd[1]: Dependency failed for Multi-User System.
Oct 07 17:33:48 vagrant-btrfs-restore systemd[1]: Dependency failed for Graphical Interface.
Oct 07 17:33:48 vagrant-btrfs-restore systemd[1]: graphical.target: Job graphical.target/start failed with result 'dependency'.
Oct 07 17:33:48 vagrant-btrfs-restore systemd[1]: multi-user.target: Job multi-user.target/start failed with result 'dependency'.

I tested read-only snapshot boot and restore two months ago and it was working with GNOME etc. pp. installed. It still works on a new machine if no graphical userinterface is installed.

Either something has happened in the meantime or I didn't test it properly. I can boot if complete /var is on a subvolume..

@ckotte just tested: yes I can still boot boot in my read-only snapshots, and no I don't have the entire /var on a separate subvolume. For full disclosure, I no longer use GRUB and thus this project, but my new setup is semantically very similar to what grub-btrfs is doing so I'm pretty sure it's not important to in this case.

Could you double-check that your /var/log is indeed mounted to @var-log subvolume? I'm not sure but it looks a bit suspicious that your subvolume is called @var-log and not say @/var/log.

As another idea, maybe our graphical environments differ, and your environment contains something that cannot work on readonly filesystem, while mine can ๐Ÿค” I'm using sway, and although it is launched via systemd, it doesn't have any dependencies. Could you try to investigate perhaps what exact dependency has failed in your case?

For reference, here's my layout.

@maximbaz @var-log subvolume is mounted. I can read the logs of the failed boot after booting normally. @/var/log would be a nested subvolume of @ like the other nested subvolumes in /var.

I guess some services cannot deal with read only /var/lib anymore and therefore it fails to boot. I cannot get access to the system. I can only read the logs after booting normally. This makes fixing this issue very difficult for me..

May I ask which bootloader you are using now? I cannot find the boot loader installation in your script. It also looks like you are using encrypted /boot? I thought only grub can handle encrypted /boot!?

So I used GRUB exactly because I had this same belief as yourself, that encrypted /boot is only possible with GRUB, and without encrypted /boot you are open to evil-maid attacks, and all hope is lost.

But I researched this topic and it turned out that there are alternative approaches that are even better!


Here's how I do normal boot:

I'm using direct UEFI boot + Secure Boot now, no bootloaders like GRUB or systemd-boot - less code means less vulnerabilities to guard yourself against.

Here's how it works:

  1. Remove built-in Secure Boot keys, generate your own keys, leave private key on encrypted disk and register public key in BIOS (this part I had even when I was using GRUB, you probably have it as well).
  2. Keep /boot on encrypted disk, and mount unencrypted ESP FAT32 partition to /efi.

Now comes the interesting part:

  1. Generate a new .efi file that you will register in BIOS as boot target (i.e. instead of grub.efi), which consists of:

    • CPU microcode (ucode.img)
    • initramfs
    • vmlinuz
    • hardcoded kernel cmdline (that specifies exact kernel arguments to boot, including root btrfs subvolume)

All of the above components are taken from encrypted /boot, so cannot be tampered with while your computer is turned off.

  1. Sign this .efi file with your own Secure Boot key and put this one file in unencrypted /efi.

  2. Configure in your BIOS that this is the boot target (instead of GRUB)

Now evil-maid attack is not possible because the only unencrypted file is your signed .efi file, and if it is being tampered with, Secure Boot will refuse to load it.

Because cmdline is hardcoded in the image, Secure Boot also guarantees that you or attackers cannot just change it (e.g. to boot in an old subvolume).

In addition, because there is less steps in the process, and especially because you aren't decrypting your disk twice (like it is the case with GRUB), the boot is soooooooooooo much faster, you wouldn't believe it.


Here's how I do recovery if my main subvolume fails to boot:

  1. Generate not only the .efi file described above, but also another .efi recovery file:

    • initramfs
    • vmlinuz
    • NO microcode (in case it causes boot failures)
    • NO hardcoded cmdline (so that we can later select which subvolume to boot in)

In fact I create two such recovery .efi files, one with latest kernel and one with LTS kernel, in case boot failures are caused by kernel upgrade.

  1. Sign these .efi files as well for Secure Boot, but do NOT add them in BIOS yet - so that if you want to boot into them, you must first use your BIOS password.

Attackers cannot use this .efi file because to boot into it they need to know your BIOS password.

Evil maid attack is not possible because this image is signed with Secure Boot keys and at no point in time do we disable Secure Boot.

Because cmdline is NOT hardcoded in these recovery images, Secure Boot will let us specify a custom one, one where you specify rootflags=subvol=snapshots/123/snapshot and boot into a snapshot 123.


There are a couple of projects that automate most of these steps, but none of them satisfied my needs precisely, so I chose to build my own tool that does exactly what I want ๐Ÿ˜„

It might be very much tailored to my personal needs and although feedback is welcome I'm not sure that I want to make it more generic (again, the less code there is, the more I can trust it).

But if you want to have a look, it is available on Github: arch-secure-boot

It automates all of the steps described above, plus contains a couple of extra integrations that I personally use:

  • pacman hooks (similar to snap-pac-grub, so I can enable the tool and forget about it)
  • integration with fwupd
  • custom EFI shell (because I use Dell laptop, and their built-in implementation is buggy)
  • a simple script that provides UI for selecting snapshot to boot into, so that I don't need to type cmdline, just boot in recovery and lazily select the snapshoot, almost as easy as grub-btrfs ๐Ÿ˜„

And here's how I configure the tool during OS installation, it's just one line to execute, and I don't need to remember about this ever again.


In order not to spam people with unwanted notifications, if you have some questions or feedback or just want to chat more about this setup, please open an issue in arch-secure-boot ๐Ÿ‘

@maximbaz Thank you very much for your detailed explanation! This sounds interesting. I also have a Dell laptop and I don't like their EFI implementation as well. I will try it out once I figured out a working setup for my read-only snapshots. ๐Ÿ˜‰

I found out why I'm not able to boot read-only snapshot anymore. It's an issue with gdm and accounts-daemon.

The errors about the other services go away if I use subvolumes for /var/cache and /var/tmp. However, the system still hangs and doesn't provide a login. It started probably with gdm 3.36. I can boot into read-only snapshots again if I create subvolumes for /var/lib/gdm/ and /var/lib/AccountsService.

Not sure if I want to implement this. Using a subvolume for /var is probably the best option when using GNOME with the disadvantage of a writeable/var when booting the read-only snapshot. I'm also not sure if I can boot the read-only snapshot if /var is corrupted...

hi all,
maybe you would be interested by a custom "hook" for the initramfs, which disables the "read-only" property
(for Arch linux initramfs only)
how to:
Create a file in /etc/initcpio/install/disablesnapro
Add this:

#!/bin/bash

build() {
    add_module btrfs
    add_binary btrfs
    add_binary btrfsck
    add_runscript
}

help() {
    cat <<HELPEOF
This hook set property ro=false in snapshot via
"btrfs property set /new_root ro false" command
for boot into read-only snapshot without errors.
HELPEOF
}

# vim: set ft=sh ts=4 sw=4 et:

Create a file in /etc/initcpio/hooks/disablesnapro
Add this:

#!/usr/bin/ash

run_latehook() {
    btrfs property set /new_root ro false
}

# vim: set ft=sh ts=4 sw=4 et:

Edit your /etc/mkinitcpio.conf
Add the custom hook at the end of the hooks argument.
HOOKS=(base udev autodetect modconf block filesystems keyboard fsck disablesnapro)
Then generate your new initramfs to include this hook
via mkinitcpio command.
(e.g: mkinitcpio -P) for all preset present in /etc/mkinitcpio.d.

Your snapshots should contain this functionality.
So it will not work on previously created snapshots.
(should work, if your "/ boot" partition is not included in your snapshots.)

Edit: disablesnapro can be renamed, but must be identical in /etc/initcpio/hook and /etc/initcpio/install
See HOOKS section to Arch Linux wiki.

This is actually a great idea @Antynea so thx for this !
I was using a customized snap-pac which changed the readonly-property after creating the snapshots. Using an mkinitcpio hook appears much cleaner to me.

I don't like the idea that a program changes this property.
There can be several reasons (especially security) for a snapshot to be only in read-only mode.
Grub doesn't allow this kind of practice, which is why it isn't possible to do so.
This feature isn't included in the kernel, I couldn't find a reference on the btrfs mailing-list.
(Certainly for the same reason mentioned by the grub project)
So our only possibility is to modify the initramfs.
grub-btrfs will never do this for you, this remains the responsibility of the user.

About /var.
According to the Linux foundation.
The majority of its content should be available for writing.
Refer to :

If you don't want a program to reside in /var, refer to its own documentation.

Just like @ckotte does, it is possible to specify different subvolumes for each directory residing in /var.
This is tedious and you have to learn about each use made by a program and/or the distribution used.

If you have a separate subvolume for /var then you will have "inconsistencies" when booting the snapshot of /. For example, the pacman database contains all the updated packages after a system upgrade but the packages are not installed in the snapshot.

I couldn't find examples, but maybe something can corrupt /var and you cannot boot anymore or a few services will fail, etc. pp. If /var is a subvolume, then you would still have the same issues when booting the snapshot of /.

It's not a big deal, because when you do a snapshot rollback of / then you just have to rollback the corresponding snapshot of /var as well, but I want to have the old state of /var when booting into a snapshot of /.

@Antynea Thanks for sharing the initramfs hook. Now I can boot without dedicated subvolumes for gdm and AccountServices or even without a dedicated subvolume for /var!

What you are talking about is correct.

but I want to have the old state of /var when booting into a snapshot of /.

Any automation diagram should be done from the initramfs.
It would be necessary to create a script or "hook" which loads the "/ var" snapshots corresponding to the state of the "root" snapshot.
It's complicated to envision such a script, but not impossible.
There must be consistency in the name of the snapshot created for root and var.

Quote from #88 (comment):

They are r/o for a reason: Booting into a snapshot changes the data and might just destroy the last available backup.
Sometimes, one needs several tries until one achieves to fix a problem, so there should be an option to start over.

I modified the hook, to reflect the quote above:
If the subvolume/snapshot is in read only mode then,
make a copy of the snapshot to read and write mode and boot on it; otherwise boot normally.
(The copy will be performed on the root filesystem (subvolid=5))

/etc/initcpio/hooks/switchsnaprotorw

#!/usr/bin/ash

run_hook() {
	local current_dev=$(resolve_device "$root"); # resolve devices for blkid
	if [[ $(blkid ${current_dev} -s TYPE -o value) = "btrfs" ]]; then
		current_snap=$(mktemp -d); # create a random mountpoint in root of initrafms
		mount -t btrfs -o ro,"${rootflags}" "$current_dev" "${current_snap}";
		if [[ $(btrfs property get "${current_snap}" ro) != "ro=false" ]]; then # check if the snapshot is in read-only mode
			snaproot=$(mktemp -d);
			mount -t btrfs -o rw,subvolid=5 "${current_dev}" "${snaproot}";
			rwdir=$(mktemp -d)
			mkdir -p ${snaproot}${rwdir} # create a random folder in root fs of btrfs device
			btrfs sub snap "${current_snap}" "${snaproot}${rwdir}/rw";
			umount "${current_snap}";
			umount "${snaproot}"
			rmdir "${current_snap}";
			rmdir "${snaproot}";
			rootflags=",subvol=${rwdir}/rw";
		else
			umount "${current_snap}";
			rmdir "${current_snap}";
		fi
	fi
}

/etc/initcpio/install/switchsnaprotorw

#!/bin/bash

build() {
    add_module btrfs
    add_binary btrfs
    add_binary btrfsck
    add_binary blkid
    add_runscript
}

help() {
    cat <<HELPEOF
This hook creates a copy of the snapshot in read only mode before boot.
HELPEOF
}

# vim: set ft=sh ts=4 sw=4 et:

The code isn't pretty, but it works.

To prevent any problems during my tests,
I disabled the hook for generating the initramfs fallback.
/etc/mkinitcpio.d/linux.preset
fallback_options="-S autodetect,switchsnaprotorw"

Could you add code to automatically delete the rw snapshot during shutdown? Maybe add a script to ${rwdir}/rw/usr/lib/systemd/system-shutdown?

@ckotte
If we destroy the running rw snapshot, understand that, this will break the shutdown.
Because this is equivalent to doing an rm -f / on a running system.

A different approach would be to use an overlayfs.
I'll think about it.

@ckotte
If we destroy the running rw snapshot, understand that, this will break the shutdown.
Because this is equivalent to doing an rm -f / on a running system.

The root fs is already mounted ro at this tage. However, /oldroot cannot even be unmounted because it's still busy. The scripts in ../system-shutdown are also executed before this step.

A script that's executed at every boot should work. Delete all subvolumes in /tmp if we not boot from a snapshot located in this directory.

New approach using overlayfs.

/etc/initcpio/install/overlaysnapro

#!/bin/bash

build() {
    add_module btrfs
    add_module overlay
    add_binary btrfs
    add_binary btrfsck
    add_binary blkid
    add_runscript
}

help() {
    cat <<HELPEOF
This hook uses overlayfs to boot on a read only snapshot.
HELPEOF
}

# vim: set ft=sh ts=4 sw=4 et:

/etc/initcpio/hooks/overlaysnapro

#!/usr/bin/ash

run_latehook() {
	local root_mnt="/new_root"
	local current_dev=$(resolve_device "$root"); # resolve devices for blkid
	if [[ $(blkid "${current_dev}" -s TYPE -o value) = "btrfs" ]] && [[ $(btrfs property get ${root_mnt} ro) != "ro=false" ]]; then # run only on a read only snapshot
		local lower_dir=$(mktemp -d -p /)
		local ram_dir=$(mktemp -d -p /)
		mount --move ${root_mnt} ${lower_dir} # move new_root to lower_dir
		mount -t tmpfs cowspace ${ram_dir}  #meuh!!! space, you can't test !
		mkdir -p ${ram_dir}/upper
		mkdir -p ${ram_dir}/work
		mount -t overlay -o lowerdir=${lower_dir},upperdir=${ram_dir}/upper,workdir=${ram_dir}/work rootfs ${root_mnt}
	fi
}

Now we have a live system based on snapshots.

Where are the temp directories lower_dir and ram_dir created? In the initramfs or where? I just would like to see the changes on top of the ro snapshot.

Everything is done in the initramfs.
the read-only snapshot remains unchanged.
The system will start Just like a live cd in non-persistent mode.
You can repair your main subvolume, or you can restore it.
Any changes made while the snapshot is running will be lost after a restart/stop.
It's the most elegant solution, which I officially adopt.

I think this is just genious. Depending on your subvolume layout, it also enables you to use your main OS as kind of an "amnesic" live system, similar to TAILS but based on your own setup.

Guess I like the overlayfs idea even more than the OpenSUSE way of using the btrfs "default subvolume" and snapper rollback, because it's also much more convenient. So thx a lot @Antynea for sharing this here ! ๐Ÿ‘

Thank you for your support.
I will include this in the project.

howdy, folks!

I decided to include in the project, the functionality based on overlayfs.
You can find out more here.
I haven't released a new version yet, so you'll have to clone the repository.

If you decide to try it, give me a feedback.
Thank you all.

So I've adapted that solution after you posted it here, and haven't found any major caveats whatsoever.

One thingI noticed, though it doesn't break anything, is that the snapper-{boot,timeline,cleanup} services are running inside the overlayfs and thus spitting out error messages to the journal.

Not sure if it's important anyways, but maybe there's a way to deactivate those units (and/or the according one from timeshift) if present ?

The overlay hook works well with nested subvolumes in /var.
For example:

@
@/boot/grub/x86_64-efi
@/var/cache
@/var/tmp
@/var/lib/machines
@/var/lib/portables
@/var/lib/libvirt/images
@home
@var-log

The nested subvolumes are not included in the snapshot and the folders are empty, but files and folders can be created via overlayfs.

There's just one Docker error, but I can create and start containers:

kernel: overlayfs: filesystem on '/var/lib/docker/check-overlayfs-support272250815/upper' not supported as upperdir
kernel: overlayfs: filesystem on '/var/lib/docker/check-overlayfs-support215908370/upper' not supported as upperdir

The temporary snapshot requires subvolumes for /var/tmp and /var/cache to get rid of all errors.
For example:

@
@/boot/grub/x86_64-efi
@/var/lib/machines
@/var/lib/portables
@/var/lib/libvirt/images
@home
@var-log
@var-cache
@var-tmp

The nested subvolumes are not included in the snapshot and the folders are empty. Despite the snapshot is rw, it's not possible to create files or folders in cache and/or tmp. I forgot which one it was. Some services still log errors and nested subvolumes are required to get rid of all errors.

I updated the temp snapshot hook to delete the temp snapshots at every reboot. I will probably use the temp snapshot hook instead of overlayfs because it's basically the same config and overlayfs adds an additional layer that could potentially cause issues in the future. At the end it probably doesn't matter much because I just want to be able to boot into a snapper snapshot if something is broken. It's not a big deal if I have errors when booting into a snapshot...

@Kr1ss-XD

Not sure if it's important anyways, but maybe there's a way to deactivate those units (and/or the according one from timeshift) if present ?

I use custom libalpm hooks for this:

/usr/share/libalpm/scripts/snap-pac-timers

#!/bin/bash
# Ansible managed

readonly pre_or_post=$1

if [[ "$pre_or_post" == "pre" ]]; then
  # snapper
  systemctl disable snapper-timeline.timer &> /dev/null
  systemctl stop snapper-timeline.timer
  printf "==> snapper-timeline.timer\n"
  systemctl disable snapper-cleanup.timer &> /dev/null
  systemctl stop snapper-cleanup.timer
  printf "==> snapper-cleanup.timer\n"
  # btrbk
  systemctl disable btrbk-hourly.timer &> /dev/null
  systemctl stop btrbk-hourly.timer
  printf "==> btrbk-hourly.timer\n"
  systemctl disable btrbk-daily.timer &> /dev/null
  systemctl stop btrbk-daily.timer
  printf "==> btrbk-daily.timer\n"
else
  # snapper
  systemctl enable snapper-timeline.timer &> /dev/null
  systemctl start snapper-timeline.timer
  printf "==> snapper-timeline.timer\n"
  systemctl enable snapper-cleanup.timer &> /dev/null
  systemctl start snapper-cleanup.timer
  printf "==> snapper-cleanup.timer\n"
  # btrbk
  systemctl enable btrbk-hourly.timer &> /dev/null
  systemctl start btrbk-hourly.timer
  printf "==> btrbk-hourly.timer\n"
  systemctl enable btrbk-daily.timer &> /dev/null
  systemctl start btrbk-daily.timer
  printf "==> btrbk-daily.timer\n"
fi

/usr/share/libalpm/hooks/00_snapper-a-pre-timers.hook

# Ansible managed

[Trigger]
Operation = Upgrade
Operation = Install
Operation = Remove
Type = Package
Target = *

[Action]
Description = Disable and stop systemd timers before performing snapper pre snapshots...
Depends = snap-pac
When = PreTransaction
Exec = /usr/share/libalpm/scripts/snap-pac-timers pre
AbortOnFail

/usr/share/libalpm/hooks/zy_snapper-post-timers.hook

# Ansible managed

[Trigger]
Operation = Upgrade
Operation = Install
Operation = Remove
Type = Package
Target = *

[Action]
Description = Enable and start systemd timers after performing snapper post snapshots...
Depends = snap-pac
When = PostTransaction
Exec = /usr/share/libalpm/scripts/snap-pac-timers post

What I'm currently testing is an additional

# disable snapper timers if any are active
rm -f /new_root/etc/systemd/system/timers.target.wants/snapper-*.timer

in /etc/initcpio/hooks/grub-btrfs-overlayfs, just after mounting the new root.

Your way is probably the better one since it should not be grub-btrfs's duty to take care of snapper timers (users also might have other ones like timeshift, btrbk and who knows what). So thank you for your comment !

Thank you for your feedback, it is much appreciated.

It is quite normal to have errors for the snapper,docker,timeshift,etc. services because if they are enabled, they expect to be on a btrfs filesystem, but this is not the case, the filesystem is overlayfs.

I don't think I can handle these cases, because the main priority of the `hook' is to be able to boot into a read-only snapshot (which remains unmodified), and to be able to perform tasks that repair the main subvolume.
It was not designed to boot into a twin system of the main subvolume.

However, I'm open to a tips section on the documentation, which could contain your personal tips, if you're interested.

The nested subvolumes are not included in the snapshot and the folders are empty

It's very strange, everything in my fstab is well set up at startup.

Check if a hook should be executed, but don't.
Also, if your fstab file contains your mount points with UUID, not device mapper (e.g /dev/sdx)

# disable snapper timers if any are active
rm -f /new_root/etc/systemd/system/timers.target.wants/snapper-*.timer

You cannot perform rm on a read-only file system.

I do it inside the overlayfs. Just after it's been mounted.

#!/usr/bin/ash

run_latehook() {
    # resolve devices for blkid
    local current_dev=$(resolve_device "$root")
    # run only when booting a read-only btrfs subvolume
    if [[ $(blkid "${current_dev}" -s TYPE -o value) = "btrfs" ]] && [[ $(btrfs property get /new_root ro) != "ro=false" ]]; then
        local lower_dir=$(mktemp -d -p /)
        local ram_dir=$(mktemp -d -p /)
        # move new_root to lower_dir
        mount --move /new_root ${lower_dir}
        # meuh!!! space, you can't test !
        mount -t tmpfs cowspace ${ram_dir}
        mkdir -p ${ram_dir}/upper
        mkdir -p ${ram_dir}/work
        mount -t overlay -o lowerdir=${lower_dir},upperdir=${ram_dir}/upper,workdir=${ram_dir}/work rootfs /new_root
        # disable snapper timers
        rm -f /new_root/etc/systemd/system/timers.target.wants/snapper-*.timer
    fi
}

That's what I'm testing right now. Nevertheless, I think it's better to have the timers disabled at snapshot time in the first place, as @ckotte suggested.

I do it inside the overlayfs

You can make any changes you like.
nvm, I replied too quickly ...

@Antynea

The nested subvolumes are not included in the snapshot and the folders are empty

It's very strange, everything in my fstab is well set up at startup.

Check if a hook should be executed, but don't.
Also, if your fstab file contains your mount points with UUID, not device mapper (e.g /dev/sdx)

JFYI. There's a difference between subvolumes (subvolumes inside top-level subvolume) and nested subvolumes (subvolumes inside subvolumes). Nested subvolumes are mounted automatically but excluded from the subvolume snapshot. Because they are not included in the snapshot, the folders (nested subvolume mountpoints) are empty in the snapshot. However, if the rw snapshot is booted, files/folders cannot be created inside those "mountpoints" in the rw snapshot. The "mountpoint" needs to be deleted first. This doesn't matter if overlayfs is used. It only matters if a rw snapshot is used.

I want to use a combination of subvolumes (do snapshots) and nested subvolumes (exclude data from snapshots) because I just don't want so many mountpoints in my fstab.

I want to use a combination of subvolumes (do snapshots) and nested subvolumes (exclude data from snapshots) because I just don't want so many mountpoints in my fstab.

To my knowledge, this isn't officially possible.
Edit: All feedbacks, are very much appreciated, thank you.

I think this is a misunderstanding. What I do is basically a mixed layout: https://btrfs.wiki.kernel.org/index.php/SysadminGuide#Mixed

Create subvolume

btrfs subvolume create /mnt/@

Create nested subvolume

mkdir -p /mnt/@/boot/grub
btrfs subvolume create /mnt/@/boot/grub/x86_64-efi
@ -> subvolume + snapshots configured + fstab
@/boot/grub/x86_64-efi -> nested subvolume + excluded from @ snapshots + automatically mounted
@/var/lib/machines -> nested subvolume + excluded from @ snapshots + automatically mounted
@/var/lib/portables -> nested subvolume + excluded from @ snapshots + automatically mounted
@/var/lib/libvirt/images -> nested subvolume + excluded from @ snapshots + automatically mounted
@home -> subvolume + snapshots configured + fstab
@var-log -> subvolume + snapshots configured + fstab
@var-cache -> subvolume + no snapshots configured + fstab
@var-tmp -> subvolume + no snapshots configured + fstab
@snapshots/root
@snapshots/home
etc. pp.

I think this is a misunderstanding

I think that due to overlayfs, the btrfs filesystem can't do the automatic assembly by itself.
That's why I replied to you

To my knowledge, this isn't officially possible.

However, a question arises.
When mounting new_root by initrafms, btrfs mounts the nested snapshots at that moment, would my hook invalidate this feature ?
Or

Because they are not included in the snapshot

Nested subvolumes are not included without the overlayfs or temp snapshot hook. The nested subvolumes are not included in the ro snapshot and therefore also not in the rw snapshot or overlayfs.

There must be an issue with the old folders where the nested subvolumes were located. They must be deleted and re-created. Otherwise, you cannot create files or create new nested subvolumes (after snapshot rollback).

When mounting new_root by initrafms, btrfs mounts the nested snapshots at that moment, would my hook invalidate this feature ?

The snapshot is mounted fine. Only the nested subvolumes are missing because they are not included in the snapshot. This works as intended.

Your hooks work fine. I tested all 3 of them with different configurations. I just added the comment about the nested subvolumes if someone has the same "issues" with nested subvolumes. ๐Ÿ˜‰

Thank you very much for all these clarifications.

@Antynea I'm not sure if I should create a separate issue for this. I'm posting here first because this is very much related to this issue.

First of all, thanks a lot for implementing the overlayfs based solution. It was a genius idea! I had recently switched to systemd-boot but switched back to GRUB after I came to know that you have implemented the overlayfs solution and now snapshots can be properly booted without any errors. This is clearly game-changing!

But I was surprised to find out yesterday that systemd already has a built-in feature for this and if I understand it correctly, it should simplify a lot of what you had to do to get overlayfs working. I think it would eliminate the need to add anything to initramfs.

Take a look at this: https://www.freedesktop.org/software/systemd/man/systemd-volatile-root.html

systemd-volatile-root.service is a service that replaces the root directory with a volatile memory file system ("tmpfs"), mounting the original (non-volatile) /usr/ inside it read-only. This way, vendor data from /usr/ is available as usual, but all configuration data in /etc/, all state data in /var/ and all other resources stored directly under the root directory are reset on boot and lost at shutdown, enabling fully stateless systems.

This service is only enabled if full volatile mode is selected, for example by specifying "systemd.volatile=yes" on the kernel command line. This service runs only in the initial RAM disk ("initrd"), before the system transitions to the host's root directory. Note that this service is not used if "systemd.volatile=state" is used, as in that mode the root directory is non-volatile.

Doesn't it sound like this is EXACTLY what grub-btrfs needed?

It's really unfortunate that such a cool feature isn't more well known and used! This feature was released with systemd version 233 in 2017-03-01. See release notes here.

Also, I wanted to point out that the current overlayfs implementation of grub-btrfs has a small drawback that when you are booted into a snapshot, because the root filesystem is overlayfs, snapper no longer works for the root configuration in that live environment. Yes, of course, you can manually mount your btrfs / to /mnt and manually manipulate the subvolumes but it would be nice if we could just use the snapper commands, especially snapper rollback.

systemd gives you both options:

  1. systemd.volatile=state: This should solve the problem I stated above.

If set to state the generator will leave the root directory mount point unaltered, however will mount a "tmpfs" file system to /var/. In this mode the normal system configuration (i.e. the contents of "/etc/") is in effect (and may be modified during system runtime), however the system state (i.e. the contents of "/var/") is reset at boot and lost at shutdown.
(Source)

  1. systemd.volatile=overlay: This seems like what you're doing now, except systemd will take care of everything.

If this setting is set to "overlay" the root file system is set up as "overlayfs" mount combining the read-only root directory with a writable "tmpfs", so that no modifications are made to disk, but the file system may be modified nonetheless with all changes being lost at reboot.
(Source)

@keyb0ardninja

I knew about systemd-volatile-root.service, but I have encountered some limitations:

  • systemd.volatile=yes and systemd.volatile=overlay settings, don't allow a read-only snapshot to be properly bootable from a virtual machine. (tested with fedora 35 and manjaro, Arch Linux).
  • systemd.volatile=state allows to boot correctly from a virtual machine on a snapshot in read-only mode, except that:
    The booted filesystem is still in read-only mode {/root,/lib,/mnt,...} still with the ro flag.
    (We can perform some actions, but do not have the possibility to install new packages for example.)

Integrating overlayfs inside the initramfs works around this problem.
Unfortunately, as you mentioned, the root file system is no longer btrfsbut overlayfs.
This is a problem with the snapper restoration tool, but i don't recommend using it.

However, I agree to integrate systemd.volatile=state.
This provides a more user-friendly approach and should work on all distros using systemd.

  • Should it be enabled by default ?
    I doubt it, it will break compatibility with another init system (OpenRC).
    A configurable option to enable it should be in the config file.

  • Should it be enabled for all snapshots, read-only and read-write mode ?
    I would say no.
    systemd.volatile=state should only be present on read-only snapshots.

Regards.

  • systemd.volatile=state allows to boot correctly from a virtual machine on a snapshot in read-only mode, except that:
    The booted filesystem is still in read-only mode {/root,/lib,/mnt,...} still with the ro flag.
    (We can perform some actions, but do not have the possibility to install new packages for example.)

You can keep your current implementation as well as provide the systemd option, so that users can choose which one they prefer.

  • Should it be enabled by default ?
    I doubt it, it will break compatibility with another init system (OpenRC).
    A configurable option to enable it should be in the config file.

This can be taken care of in packaging, right? So, the Arch Linux package can have it enabled by default because only systemd is officially supported in Arch, while Void Linux should not have it enabled by default.

  • Should it be enabled for all snapshots, read-only and read-write mode ?
    I would say no.
    systemd.volatile=state should only be present on read-only snapshots.

Yes, absolutely. I actually really appreciate that your current implementation also has this behaviour. One of my use-cases is quickly creating a twin of my current system via a read-write snapshot to mess around with, but discarding it only when I no longer need it, which means persisting my changes across reboots. So, a read-write snapshots should not use overlayfs, especially since it is not essential.

For those who tried to configure the hook with overlayfs and had trouble: it will not work with existing snapshots because the linux image for those doesn't use the newly added hook (obviously).

If you want to try it out then you need to create a new snapshot after you run mkinitcpio -P.

You know, instead of testing your workarounds that you affectionately call 'hooks', it would be much easier to place a small recovery ISO in the boot partition and add it to GRUB. Strange people! After 4 years, they haven't found an optimal, reasonable, OS-independent solution!