ubuntu/zsys

gc cannot destroy snapshot due to 2 pools being mounted as home

sylvainfaivre opened this issue · 3 comments

Describe the bug
Fully updated Ubuntu 22.10 machine. First installed with Ubuntu 21.10.
My setup is fully encrypted.

Trying to tun gc :

❯ sudo zsysctl service gc                                     
ERROR Couldn't fully destroy user state rpool/USERDATA/sylvain_loeo0c: Couldn't destroy rpool/USERDATA/sylvain_loeo0c: couldn't destroy "rpool/USERDATA/sylvain_loeo0c" and its children: cannot destroy dataset "rpool/USERDATA/sylvain_loeo0c": dataset is busy.
Putting it in keep list. 
ERROR Couldn't fully destroy user state rpool/USERDATA/sylvain_qn532u@autozsys_qjwbla: Couldn't destroy rpool/USERDATA/sylvain_qn532u@autozsys_qjwbla: couldn't destroy "rpool/USERDATA/sylvain_qn532u@autozsys_qjwbla" due to clones: "rpool/USERDATA/sylvain_qn532u@autozsys_qjwbla" has some clones ([rpool/USERDATA/sylvain_loeo0c]) when it shouldn't.
Putting it in keep list. 

List all snapshots on this pool :

❯ sudo zfs list -rt all rpool/USERDATA
NAME                                            USED  AVAIL     REFER  MOUNTPOINT
rpool/USERDATA                                  151G  16.2G      192K  /
rpool/USERDATA/root_qn532u                     4.59G  16.2G     4.59G  /root
rpool/USERDATA/root_qn532u@autozsys_ktrxrn      320K      -     4.59G  -
rpool/USERDATA/root_qn532u@autozsys_v2erx1       96K      -     4.59G  -
rpool/USERDATA/root_qn532u@autozsys_9q50s0      280K      -     4.59G  -
rpool/USERDATA/sylvain_loeo0c                  41.4G  16.2G      124G  /home/sylvain
rpool/USERDATA/sylvain_qn532u                   105G  16.2G      105G  /home/sylvain
rpool/USERDATA/sylvain_qn532u@autozsys_qjwbla  3.96M      -      105G  -
rpool/USERDATA/sylvain_qn532u@autozsys_ktrxrn     0B      -      105G  -
rpool/USERDATA/sylvain_qn532u@autozsys_v2erx1     0B      -      105G  -
rpool/USERDATA/sylvain_qn532u@autozsys_9q50s0     0B      -      105G  -
rpool/USERDATA/sylvain_qn532u@autozsys_ga1sja     0B      -      105G  -
rpool/USERDATA/sylvain_qn532u@autozsys_j9uroz     0B      -      105G  -
rpool/USERDATA/sylvain_qn532u@autozsys_3929kg     0B      -      105G  -

As you can see, 2 volumes are mounted as my home :

❯ mount |grep home
rpool/USERDATA/sylvain_qn532u on /home/sylvain type zfs (rw,relatime,xattr,posixacl)
rpool/USERDATA/sylvain_loeo0c on /home/sylvain type zfs (rw,relatime,xattr,posixacl)

I'm not quite sure how I can check which one really contains my data, and how I can fix my setup so that only one shows as mounted.

Well, the _loeo0c one seems to be the remains from some old stuff, as it is smaller, and doesn't matches the name of any snapshot.

See also :

❯ sudo zsysctl show                                           
Name:           rpool/ROOT/ubuntu_bcuvug
ZSys:           true
Last Used:      current
History:        
  - Name:       rpool/ROOT/ubuntu_bcuvug@autozsys_9q50s0
    Created on: 2023-01-12 10:03:01
  - Name:       rpool/ROOT/ubuntu_bcuvug@autozsys_v2erx1
    Created on: 2023-01-11 11:35:02
  - Name:       rpool/ROOT/ubuntu_bcuvug@autozsys_ktrxrn
    Created on: 2023-01-10 11:55:15
Users:
  - Name:    root
    History: 
     - rpool/USERDATA/root_qn532u@autozsys_9q50s0 (2023-01-12 10:03:01)
     - rpool/USERDATA/root_qn532u@autozsys_v2erx1 (2023-01-11 11:35:02)
     - rpool/USERDATA/root_qn532u@autozsys_ktrxrn (2023-01-10 11:55:15)
  - Name:    sylvain
    History: 
     - rpool/USERDATA/sylvain_qn532u@autozsys_3929kg (2023-01-12 13:27:23)
     - rpool/USERDATA/sylvain_qn532u@autozsys_j9uroz (2023-01-12 12:26:23)
     - rpool/USERDATA/sylvain_qn532u@autozsys_ga1sja (2023-01-12 11:25:23)
     - rpool/USERDATA/sylvain_qn532u@autozsys_9q50s0 (2023-01-12 10:03:01)
     - rpool/USERDATA/sylvain_qn532u@autozsys_v2erx1 (2023-01-11 11:35:02)
     - rpool/USERDATA/sylvain_qn532u@autozsys_ktrxrn (2023-01-10 11:55:15)
     - rpool/USERDATA/sylvain_qn532u@autozsys_qjwbla (2022-05-31 10:28:55)
     - rpool/USERDATA/sylvain_loeo0c (0001-01-01 00:00:00)

To Reproduce
I don't know how it got to this.
I just looked into this because I started getting warnings about zsys not creating new snapshots because free space was less than 20%.

Expected behavior
Be able to delete old snapshots.

For ubuntu users, please run and copy the following:

  1. ubuntu-bug zsys --save=/tmp/report

Sorry, cannot paste :

There was an error creating your Issue: body is too long, body is too long (maximum is 65536 characters). 

Screenshots
n/a

Installed versions:

  • OS: Ubuntu 22.10
  • Zsysd running version: zsysctl 0.5.9, zsysd 0.5.9

Additional context
I might have used grub option to boot into an older snaphshot, a few months ago.
I might also have used other zfs snapshot tools at some point.

might be related to #155 #196 #218

Some more info that might be relevant :

root@sylvain-thinkpad:~# lsblk -f |grep -v '^loop'
NAME             FSTYPE      FSVER LABEL          UUID                                 FSAVAIL FSUSE% MOUNTPOINTS
zd0              crypto_LUKS 2                    74cb1202-c3fd-4cc2-b6e8-371de9809bd7                
└─keystore-rpool ext4        1.0   keystore-rpool 036e9411-dc83-4357-b9c8-03878ea4bf2e  418,2M     0% /run/keystore/rpool
zd16             swap        1                    2bd9bcac-158c-489f-bd8d-e96410ce512b                [SWAP]
nvme0n1                                                                                               
├─nvme0n1p1      vfat        FAT32                50B8-61AB                             496,8M     3% /boot/grub
│                                                                                                     /boot/efi
├─nvme0n1p2                                                                                           
│ └─cryptoswap   swap        1     cryptoswap     f31ae71c-1c8f-499b-a477-d3b86d429d0c                [SWAP]
├─nvme0n1p3      zfs_member  5000  bpool          17336107262982257653                                
└─nvme0n1p4      zfs_member  5000  rpool          4099056407353820615  
root@sylvain-thinkpad:~# parted /dev/zd0 print
Error: /dev/zd0: unrecognised disk label
Model: Unknown (unknown)                                                  
Disk /dev/zd0: 524MB
Sector size (logical/physical): 512B/8192B
Partition Table: unknown
Disk Flags: 
root@sylvain-thinkpad:~# parted /dev/nvme0n1 print
Model: SSSTC CA5-8D256-Q79 (nvme)
Disk /dev/nvme0n1: 256GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt
Disk Flags: 

Number  Start   End     Size    File system  Name                  Flags
 1      1049kB  538MB   537MB   fat32        EFI System Partition  boot, esp
 2      538MB   2685MB  2147MB                                     swap
 3      2685MB  4833MB  2147MB  zfs
 4      4833MB  256GB   251GB   zfs

Well I really shot myself in the foot here...

  • I switched to another user account
  • I checked which filesystem contained up-to-date data : it was rpool/USERDATA/sylvain_loeo0c
  • I checked the contents of all autozsys snapshots for rpool/USERDATA/sylvain_qn532u : they all had the same data from may 2022, so zsys had been doing snapshots of an older version of the filesystem for all these months
  • Then, instead of manually deleting the older filesystem and snapshots, I ran zsysctl service gc, which promptly destroyed my current data

I'll take this as a learning opportunity, at least I still have my files from may 2022 (and some backups too).
Now I'm gonna close this bug as it's a dupe for #218