kimono-koans/httm

usability: document how to take snapshots manually or on an interval

bketelsen opened this issue ยท 6 comments

My system is using zsys to do snapshots, which seem to happen when I install things with apt.

httm should provide some generalized documentation on how to create snapshots either manually or scheduled, so that there are more snapshots to choose from when using httm. Especially given the low disk cost of snapshots in zfs.

While httm doesn't provide snapshot functionality, there is sanoid by Jim Salter which can take snapshots on an interval (for example, keep 24 hours of snapshots, then maintain daily snapshots for a week, and then weekly snapshots for a month and so on). You might find that useful: https://github.com/jimsalterjrs/sanoid/

Agree with both of you.

@bketelsen users may want some documentation on how I use httm as an alternative is to zsys, and @CKingX yes, sanoid is one of the tools I use.

I'll type up a gist when I have the time. Thank you both!

Took me awhile: https://kimono-koans.github.io/opinionated-guide/

Thanks for waiting so long!

I have found your posts and as a zfs long time user, I have been really interested and impressed by them. Specially your "A Somewhat Opinionated Guide to Effective ZFS Snapshots", which I liked very much to improve my snapshot strategy. So I followed your recommendations and I added a script into /etc/apt/apt.conf.d to execute your /usr/local/sbin/snapPrepApt script every time I launch an apt upgrade. Cool! I did that not only in my laptop, but in my server also.

Next morning when I connected my laptop after shutting down it last night, the grub text screen came in saying it was unable to find a kernel to boot. I was surprised because I didn't remember anything I could have done last day that could have affected my boot pool. As long as some other times I did had some problems with grub and my zfs pools, I started to do what saved my life those other times, but with no luck. I was really surprised, because after several hours later I was unable to solve my laptop problem. I went to bed that day really frustrated and next day after another several hours of duck duck go search I found the following posts:

openzfs/zfs#15261
https://savannah.gnu.org/bugs/index.php?64297

And those posts made me think about your /usr/local/sbin/snapPrepApt script and the second line inside it:

zfs snapshot -r bpool@snap_"$DATE"_prepApt

That`s a snap of the top level dataset of my bpool, which was exactly what was described there, so I decided to follow the directions given there: zpool destroy bpool and recreating it and it's datasets, and yes, that was it!

I changed the script line into:

zfs snapshot -r bpool/BOOT/ubuntu_xxxxxx@snap_"$DATE"_prepApt

And now it works great.

After that, I rebooted my server in order to see if it was affected, and yes, it was. So I modified it the same way as I did with my laptop, but there was also a problem with zfs restoring my bpool when I recreated it after a zpool destroy, so I read that zpool was not a destructive command, but zfs destroy was, so after I zfs destroyed my bpool datasets an zpool destroyed my bpool, it finally worked, so I also changed the second line in your script, and now it's working with no issues.

For the record, I'm using ubuntu mantic (23.10) in both my laptop and my server.

I wanted you to know this, because I think it would be a good idea to modify your script with such a small change, of course after you play around with what I'm telling. I think new readers of your post will be very happy if you do.

Anyway, thank you very much for your posts and scripts, which I love and I'm using extensively now.

I have found your posts and as a zfs long time user, I have been really interested and impressed by them. Specially your "A Somewhat Opinionated Guide to Effective ZFS Snapshots", which I liked very much to improve my snapshot strategy.

Great!

I wanted you to know this, because I think it would be a good idea to modify your script with such a small change, of course after you play around with what I'm telling. I think new readers of your post will be very happy if you do.

Sorry that example script didn't work out so well for you.

For the record, I have never encountered this bug on Ubuntu 16.04, 20.04, or 22.04 and my /usr/local/sbin/snapPrepApt remains:

#!/bin/bash

DATE="$( /bin/date +%F-%T )"

zfs snapshot -r bpool@snap_"$DATE"_prepApt
zfs snapshot -r rpool@snap_"$DATE"_prepApt

However, I'd be very pleased to add a note to the user on that blog entry warning them about the potential for this bug, given this bug has such a nasty failure mode. I have seen ZFS snapshots run a small bpool out of space? Or perhaps it has something to do with the initramfs update scripts?

Anyway, thank you very much for your posts and scripts, which I love and I'm using extensively now.

You're welcome!

Thank you for replying in such an old post.

Sorry, that example script didn't work out so well for you.

The problem was not the script, it's a grub bug, not yours. What I'm proposing is just a work around in case you're affected by it.

For the record, I have never encountered this bug on Ubuntu 16.04, 20.04, or 22.04

I'm using 23.10 in both my laptop and my home server. First one has 2 NVMe 2Tb disks with no mirror or raidz configuration and 32Gb ram. My server has same disks and 64Gb ram, but this time both disks work in raidz mode. Rpool has logs and cache partitions, mirrored in logs. 2 completely different configs, but in both, grub failed miserably. :-)

However, I'd be very pleased to add a note to the user on that blog entry warning them about the potential for this bug, given this bug has such a nasty failure mode.

Good, thank you!

I have seen ZFS snapshots run a small bpool out of space? Or perhaps it has something to do with the initramfs update scripts?

Don't know. I think my knowledge doesn't go that far, but could be, there are several of them related with boot. boot-premount?

Thanks again for your magical work!