oetiker/znapzend

There seems to be a problem replicating datasets that are zfs-promoting branches part of bigger tree (rpool/ROOT, zones...)

jimklimov opened this issue · 20 comments

As I was checking how my backups went, I found that replicas of my rootfs related datasets are not branches of each other, but seem to be unique data histories (with no "origin" set in each of them on the backup side; I suppose --autoCreation was involved, my configs have it enabled by default). This causes confusion at least when a backup pool is not regularly accessible, and older automatic snapshots get deleted, and there is no common point to resync from (not even among snapshot made by beadm, etc.).

This may also waste space on backup pool, writing many copies of rootfs instead of building on top of shared history, but I am not fully sure about that bit (my backup pool claims a high dedup ratio while dedup is off in dataset settings).

In practice this impacts primarily rpool/ROOT/something(/subsets) and pool/zones/zonerootnames/... that are automatically shuffled as a particular rootfs gets activated (so whatever is related to the current rootfs is promoted as the main tree of ZFS "live" data, and older roots are origined at its snapshots), but I suppose other zfs promotions are similarly susceptible.

My guess is that autocreation should check if the newly found dataset has an origin snapshot attribute, and issue a replication going from that point not from scratch. The "origin" key word is not seen in znapzend codebase ;)

  • There are likely to be nuances - e.g. apparent riddle is what if the dataset pointed to by the origin field is not currently on the remote destination?
    ** Is it going to appear, e.g. as part of same autocreation run? => re-schedule it to appear first, at least up to this snapshot? (maybe recurse through further origins as needed)
    ** Does it have some other znapzend policy on source, so can appear from another run? => use that knowledge somehow? or is it a breach of whatever sanity in place? just do a runonce --inherited on that other policy, and wait for it to complete?
    ** Can it make sense to send the current (non-"original") dataset in a way similar to that explored for --sinceForced=... from #497 to make sure that even if it appears as an independent history of incremental snapshots, that history includes the origin snapshot (probably not named by znapzend timestamped patterns) so that if the user later adds that origin to a backed-up policy, it could be re-branched from such point. Common history is common, whoever owns it at the moment.
    ** Oh, and there are also subsequent zfs promotions shuffling around the current owners of historic snapshots and their consumers who branched off that point...
  • What if the dataset pointed to by the origin field already exists on the remote destination, but does not have that snapshot (likely not named by znapzend pattern)? => ask user to consider something like --sinceForced=... from #497 to make that snapshot appear, at the cost of losing and rewriting later data?
  • At least, the current behavior can remain as fallback for unhandled non-interactive edge cases, there's gotta be a lot of them. But the user can be informed in some summary at end of run (and maybe with e-mail per #499)

Examples from a live inspection session of where a replication avoided cleanup (of src and dst) because of this:

[Sat Aug  8 03:30:00 2020] [warn] ERROR: suspending cleanup source dataset because 21 send task(s) failed:
[Sat Aug  8 03:30:00 2020] [warn]  +-->   ERROR: snapshot(s) exist on destination,
    but no common found on source and destination: clean up destination
    backup-adata/snapshots/nvpool/ROOT/hipster_2019.10-20191115T133333Z
    (i.e. destroy existing snapshots)
...

Looking at snapshots on current destination - they range from 2019-12-08 to 2019-12-10, including a few manually named snapshots along the way:

root@jimoi:/root# zfs list -d1 -tall -r backup-adata/snapshots/nvpool/ROOT/hipster_2019.10-20191115T133333Z
NAME                                                                                                     USED  AVAIL  REFER  MOUNTPOINT
backup-adata/snapshots/nvpool/ROOT/hipster_2019.10-20191115T133333Z                                     8.03G   507G   501M  legacy
backup-adata/snapshots/nvpool/ROOT/hipster_2019.10-20191115T133333Z@znapzend-auto-2019-12-08T10:43:41Z   270K      -   501M  -
backup-adata/snapshots/nvpool/ROOT/hipster_2019.10-20191115T133333Z@20191208-01                          270K      -   501M  -
backup-adata/snapshots/nvpool/ROOT/hipster_2019.10-20191115T133333Z@20191208-02                          287K      -   501M  -
backup-adata/snapshots/nvpool/ROOT/hipster_2019.10-20191115T133333Z@znapzend-auto-2019-12-09T11:11:12Z   738K      -   501M  -
backup-adata/snapshots/nvpool/ROOT/hipster_2019.10-20191115T133333Z@znapzend-auto-2019-12-10T09:30:00Z   306K      -   501M  -
backup-adata/snapshots/nvpool/ROOT/hipster_2019.10-20191115T133333Z@znapzend-auto-2019-12-10T10:30:00Z   272K      -   501M  -
backup-adata/snapshots/nvpool/ROOT/hipster_2019.10-20191115T133333Z@znapzend-auto-2019-12-10T11:30:00Z   272K      -   501M  -
backup-adata/snapshots/nvpool/ROOT/hipster_2019.10-20191115T133333Z@znapzend-auto-2019-12-10T12:30:00Z   263K      -   501M  -
backup-adata/snapshots/nvpool/ROOT/hipster_2019.10-20191115T133333Z@znapzend-auto-2019-12-10T13:30:00Z   266K      -   501M  -
backup-adata/snapshots/nvpool/ROOT/hipster_2019.10-20191115T133333Z@znapzend-auto-2019-12-10T14:30:00Z   266K      -   501M  -
backup-adata/snapshots/nvpool/ROOT/hipster_2019.10-20191115T133333Z@znapzend-auto-2019-12-10T15:30:00Z   266K      -   501M  -
backup-adata/snapshots/nvpool/ROOT/hipster_2019.10-20191115T133333Z@znapzend-auto-2019-12-10T17:00:00Z      0      -   501M  -

On the source the history starts from 2019-12-19, so indeed nothing in common to sync from, right?..

root@jimoi:/root# zfs list -d1 -tall -r nvpool/ROOT/hipster_2019.10-20191115T133333Z
NAME                                                                              USED  AVAIL  REFER  MOUNTPOINT
nvpool/ROOT/hipster_2019.10-20191115T133333Z                                     71.4M   110G   501M  /
nvpool/ROOT/hipster_2019.10-20191115T133333Z@znapzend-auto-2019-12-19T09:30:00Z   117K      -   501M  -
nvpool/ROOT/hipster_2019.10-20191115T133333Z@znapzend-auto-2019-12-19T10:00:00Z   286K      -   501M  -
nvpool/ROOT/hipster_2019.10-20191115T133333Z@znapzend-auto-2019-12-19T10:30:00Z   276K      -   501M  -
nvpool/ROOT/hipster_2019.10-20191115T133333Z@znapzend-auto-2019-12-19T11:00:00Z   278K      -   501M  -
nvpool/ROOT/hipster_2019.10-20191115T133333Z@znapzend-auto-2019-12-19T11:30:00Z      0      -   501M  -
nvpool/ROOT/hipster_2019.10-20191115T133333Z@znapzend-auto-2019-12-19T12:00:00Z      0      -   501M  -
... skip a few thousand half-hourlies ...
nvpool/ROOT/hipster_2019.10-20191115T133333Z@znapzend-auto-2020-07-23T10:00:00Z      0      -   501M  -
nvpool/ROOT/hipster_2019.10-20191115T133333Z@znapzend-auto-2020-07-23T10:30:00Z      0      -   501M  -
nvpool/ROOT/hipster_2019.10-20191115T133333Z@znapzend-auto-2020-07-30T15:59:08Z      0      -   501M  -
nvpool/ROOT/hipster_2019.10-20191115T133333Z@znapzend-auto-2020-07-30T18:25:39Z      0      -   501M  -
nvpool/ROOT/hipster_2019.10-20191115T133333Z@znapzend-auto-2020-08-08T00:38:26Z      0      -   501M  -

What about the origins?

root@jimoi:/root# zfs get origin {backup-adata/snapshots/,}nvpool/ROOT/hipster_2019.10-20191115T133333Z
NAME                                                                 PROPERTY  VALUE                                                             SOURCE
backup-adata/snapshots/nvpool/ROOT/hipster_2019.10-20191115T133333Z  origin    -                                                                 -
nvpool/ROOT/hipster_2019.10-20191115T133333Z                         origin    nvpool/ROOT/hipster_2020.04-20200622T165833Z@2019-12-19-09:10:26  -

So the source rootfs dataset historically cloned off a filesystem to update into newer release at "2019-12-19-09:10:26", upgrading from "hipster_2019.10*" to "hipster_2020.04*" and the zfs tree was re-balanced to promote the currently activated rootfs as the owner of all history, inverting the relation of who is a clone of whom (data-wise this is equivalent).

The destination dataset is a poor orphan without origins, in fact most of them are (I may have initialized the backup pool by replicating my system without znapzend, so oldest rootfs datasets on backup have proper origins). I suppose whenever znapzend found a new rootfs and had autoCreation enabled, it just made an automatic snapshot and sent it from scratch as the starting point, and rotated since, independently of other rootfs'es on the source pool.

Looking at manually named snapshots on the source datasets seems to confirm this guess, the ones expected to be common with "hipster_2019.10*" rootfs source and backup, are now part of "hipster_2020.04*" history in a relation that znapzend currently does not handle:

root@jimoi:/root# zfs list -tall -r nvpool/ROOT/hipster_2020.04-20200622T165833Z | grep @2019

nvpool/ROOT/hipster_2020.04-20200622T165833Z@2019-01-29-09:14:04                                    9.39M      -   521M  -
nvpool/ROOT/hipster_2020.04-20200622T165833Z@2019-02-13-01:22:00                                        0      -   521M  -
nvpool/ROOT/hipster_2020.04-20200622T165833Z@2019-03-22-08:56:38                                     173K      -   539M  -
nvpool/ROOT/hipster_2020.04-20200622T165833Z@20190826-01                                            98.7M      -   536M  -
nvpool/ROOT/hipster_2020.04-20200622T165833Z@20190830-01                                             382K      -   536M  -
nvpool/ROOT/hipster_2020.04-20200622T165833Z@20190910-01                                            3.64M      -   539M  -
nvpool/ROOT/hipster_2020.04-20200622T165833Z@20190910-02                                            6.41M      -   539M  -
nvpool/ROOT/hipster_2020.04-20200622T165833Z@20191003-01                                            85.1M      -   542M  -
nvpool/ROOT/hipster_2020.04-20200622T165833Z@20191003-02                                                0      -   542M  -
nvpool/ROOT/hipster_2020.04-20200622T165833Z@2019-10-03-15:33:08                                        0      -   542M  -
nvpool/ROOT/hipster_2020.04-20200622T165833Z@2019-10-03-23:19:11                                    88.0M      -   542M  -
nvpool/ROOT/hipster_2020.04-20200622T165833Z@2019-10-04-12:28:06                                    85.1M      -   543M  -
nvpool/ROOT/hipster_2020.04-20200622T165833Z@2019-11-15-14:33:11                                     205K      -   544M  -
nvpool/ROOT/hipster_2020.04-20200622T165833Z@20191208-01                                             282K      -   501M  -
nvpool/ROOT/hipster_2020.04-20200622T165833Z@20191208-02                                             297K      -   501M  -
nvpool/ROOT/hipster_2020.04-20200622T165833Z@2019-12-19-09:10:26                                        0      -   501M  -

Sending an incremental replication from "the owner of history" to its differently-named clone seems to be a valid operation:

root@jimoi:/root# zfs send -R -I \
    nvpool/ROOT/hipster_2020.04-20200622T165833Z@2019-12-19-09:10:26 \
    nvpool/ROOT/hipster_2019.10-20191115T133333Z@znapzend-auto-2019-12-19T12:00:00Z \
    | mbuffer -m 1g | zfs recv -vue backup-adata/snapshots/nvpool/ROOT

in @  0.0 KiB/s, out @  0.0 KiB/s, 28.0 KiB total, buffer   7% full
...

it showed some data read into the buffer, but for the past several minutes it is blinking the destination disk and showing no traffic so I'm a bit lost whether ZFS is doing anything in fact... maybe the kernel is thinking how to handle that...

UPDATE: Alas, after 7 minutes it found there is no good way to send from original origin:

cannot receive: local origin for clone backup-adata/snapshots/nvpool/ROOT/hipster_2019.10-20191115T133333Z@20191219-01 does not exist
mbuffer: error: outputThread: error writing to <stdout> at offset 0x7000: Broken pipe

summary: 30.0 KiByte in  7min 42.7sec - average of  0.1 KiB/s

indeed:

root@jimoi:/root# zfs list backup-adata/snapshots/nvpool/ROOT/hipster_2020.04-20200622T165833Z@2019-12-19-09:10:26
cannot open 'backup-adata/snapshots/nvpool/ROOT/hipster_2020.04-20200622T165833Z@2019-12-19-09:10:26': dataset does not exist

Makes sense in hindsight: the new name of rootfs started life as a poor orphan... so it has no history either:

root@jimoi:/root# zfs list -tall -r backup-adata/snapshots/nvpool/ROOT/hipster_2020.04-20200622T165833Z
NAME                                                                                                                       USED  AVAIL  REFER  MOUNTPOINT
backup-adata/snapshots/nvpool/ROOT/hipster_2020.04-20200622T165833Z                                                       9.57G   507G   530M  legacy
backup-adata/snapshots/nvpool/ROOT/hipster_2020.04-20200622T165833Z@znapzend-auto-2020-07-22T17:00:00Z                     324K      -   507M  -
backup-adata/snapshots/nvpool/ROOT/hipster_2020.04-20200622T165833Z@znapzend-auto-2020-07-22T17:30:00Z                     326K      -   507M  -
...
backup-adata/snapshots/nvpool/ROOT/hipster_2020.04-20200622T165833Z@znapzend-auto-2020-07-23T10:30:00Z                     324K      -   507M  -
backup-adata/snapshots/nvpool/ROOT/hipster_2020.04-20200622T165833Z@znapzend-auto-2020-07-30T15:59:08Z                     333K      -   530M  -
backup-adata/snapshots/nvpool/ROOT/hipster_2020.04-20200622T165833Z@znapzend-auto-2020-07-30T18:25:39Z                     333K      -   530M  -
backup-adata/snapshots/nvpool/ROOT/hipster_2020.04-20200622T165833Z@znapzend-auto-2020-08-08T00:38:26Z                        0      -   530M  -

Probably a general non-disruptive solution is not possible: if we do have a history written, and suddenly a clone is made from an old manually named snapshot that is not present on destination, we might not be able to replicate without rolling back stuff from the trunk. If an even older common point exists, destination could be cloned from that, replicated up to the "origin" of the newly found source clone, and upwards from that point as the history of the new source clone; but this is likely to be fragile and work in some cases at best (which is better than nothing), and something admins should have taken care of in the past of their backup increments to do have those old common snapshots.

At the very least, when a new such situation arises, and there are no snapshots on destination newer than the divergence point (at least, none named manually and not via znapzend configured pattern), e.g. after a beadm update and/or package installation which makes a rootfs backup clone, znapzend can employ the logic for --since=X (or --sinceForced=X to roll back automated snapshots as needed) to ensure that the snapshot which is origin for another newly found clone appears on destination, and cleanly branched zfs tree can grow and maybe rebalance from that (we probably can detect the discrepancy of origins to understand that a zfs promote happened on source since we last looked). For the more complex cases we can stop and spew recommendations as we do now.

A big data point to design such stuff: here's how the BE layout on that system looks, in destination and source:

root@jimoi:/root# zfs list -d1 -tfilesystem -o name,origin -r {backup-adata/snapshots/,}nvpool/ROOT
NAME                                                                          ORIGIN
backup-adata/snapshots/nvpool/ROOT                                            -
backup-adata/snapshots/nvpool/ROOT/firefly_0215                               -
backup-adata/snapshots/nvpool/ROOT/firefly_0215a                              backup-adata/snapshots/nvpool/ROOT/firefly_0215@20180203-01
backup-adata/snapshots/nvpool/ROOT/hipster_2016.10_mate                       backup-adata/snapshots/nvpool/ROOT/hipster_2018.10-20190322T085637Z-bak1@2016-11-09-23:38:21
backup-adata/snapshots/nvpool/ROOT/hipster_2016.10_mate_drm-20170430T155411Z  backup-adata/snapshots/nvpool/ROOT/hipster_2018.10-20190322T085637Z-bak1@2017-05-03-04:49:41
backup-adata/snapshots/nvpool/ROOT/hipster_2017.04                            backup-adata/snapshots/nvpool/ROOT/hipster_2018.10-20190322T085637Z-bak1@2017-05-31-06:51:36
backup-adata/snapshots/nvpool/ROOT/hipster_2017.04-20170903T231101Z           backup-adata/snapshots/nvpool/ROOT/hipster_2018.10-20190322T085637Z-bak1@2017-11-12-18:40:47
backup-adata/snapshots/nvpool/ROOT/hipster_2017.10                            backup-adata/snapshots/nvpool/ROOT/hipster_2018.10-20190322T085637Z-bak1@2017-12-08-11:46:57
backup-adata/snapshots/nvpool/ROOT/hipster_2017.10-20171227T103659Z           backup-adata/snapshots/nvpool/ROOT/hipster_2018.10-20190322T085637Z-bak1@2017-12-30-10:46:38
backup-adata/snapshots/nvpool/ROOT/hipster_2017.10-20171227T103659Z-nv        backup-adata/snapshots/nvpool/ROOT/hipster_2018.10-20190322T085637Z-bak1@2018-01-31-23:41:11
backup-adata/snapshots/nvpool/ROOT/hipster_2017.10-20180203T155526Z           backup-adata/snapshots/nvpool/ROOT/hipster_2018.10-20190322T085637Z-bak1@2018-05-03-14:19:26
backup-adata/snapshots/nvpool/ROOT/hipster_2018.04-20180503T141758Z           backup-adata/snapshots/nvpool/ROOT/hipster_2018.10-20190322T085637Z-bak1@2018-07-24-11:26:56
backup-adata/snapshots/nvpool/ROOT/hipster_2018.04-20180724T112647Z           backup-adata/snapshots/nvpool/ROOT/hipster_2018.10-20190322T085637Z-bak1@2018-11-13-09:32:46
backup-adata/snapshots/nvpool/ROOT/hipster_2018.10-20181113T103249Z           backup-adata/snapshots/nvpool/ROOT/hipster_2018.10-20190322T085637Z-bak1@2019-02-13-01:22:00
backup-adata/snapshots/nvpool/ROOT/hipster_2018.10-20190129T091404Z           backup-adata/snapshots/nvpool/ROOT/hipster_2018.10-20190322T085637Z-bak1@2019-01-29-09:14:04
backup-adata/snapshots/nvpool/ROOT/hipster_2018.10-20190213T012200Z           backup-adata/snapshots/nvpool/ROOT/hipster_2018.10-20190322T085637Z-bak1@2019-03-22-08:56:38
backup-adata/snapshots/nvpool/ROOT/hipster_2018.10-20190322T085637Z           -
backup-adata/snapshots/nvpool/ROOT/hipster_2018.10-20190322T085637Z-bak1      -
backup-adata/snapshots/nvpool/ROOT/hipster_2018.10-20191003T110320Z           -
backup-adata/snapshots/nvpool/ROOT/hipster_2018.10-20191003T110320Z-nvm       -
backup-adata/snapshots/nvpool/ROOT/hipster_2018.10-20191004T122806Z           -
backup-adata/snapshots/nvpool/ROOT/hipster_2018.10-20191004T122806Z-backup-1  -
backup-adata/snapshots/nvpool/ROOT/hipster_2018.10-20191004T122806Z-bak1      -
backup-adata/snapshots/nvpool/ROOT/hipster_2019.10-20191115T133333Z.x         -
backup-adata/snapshots/nvpool/ROOT/hipster_2019.10-20191219T091024Z           -
backup-adata/snapshots/nvpool/ROOT/hipster_2019.10-20200113T161936Z           -
backup-adata/snapshots/nvpool/ROOT/hipster_2019.10-20200114T153553Z           -
backup-adata/snapshots/nvpool/ROOT/hipster_2019.10-20200114T153553Z-backup-1  -
backup-adata/snapshots/nvpool/ROOT/hipster_2019.10-20200127T101549Z           -
backup-adata/snapshots/nvpool/ROOT/hipster_2019.10-20200127T101549Z-backup-1  -
backup-adata/snapshots/nvpool/ROOT/hipster_2019.10-20200414T095506Z           -
backup-adata/snapshots/nvpool/ROOT/hipster_2019.10-20200416T054436Z           -
backup-adata/snapshots/nvpool/ROOT/hipster_2019.10-20200416T054436Z-backup-1  -
backup-adata/snapshots/nvpool/ROOT/hipster_2019.10-20200416T054436Z-backup-2  -
backup-adata/snapshots/nvpool/ROOT/hipster_2019.10-20200416T054436Z-backup-3  -
backup-adata/snapshots/nvpool/ROOT/hipster_2019.10-20200425T140249Z           -
backup-adata/snapshots/nvpool/ROOT/hipster_2019.10-20200425T140249Z-backup-1  -
backup-adata/snapshots/nvpool/ROOT/hipster_2020.04                            -
backup-adata/snapshots/nvpool/ROOT/hipster_2020.04-20200608T084252Z           -
backup-adata/snapshots/nvpool/ROOT/hipster_2020.04-20200622T165833Z           -

nvpool/ROOT                                                                   -
nvpool/ROOT/firefly_0215                                                      -
nvpool/ROOT/firefly_0215a                                                     nvpool/ROOT/firefly_0215@20180203-01
nvpool/ROOT/hipster_2016.10_mate                                              nvpool/ROOT/hipster_2020.04-20200622T165833Z@2016-11-09-23:38:21
nvpool/ROOT/hipster_2016.10_mate_drm-20170430T155411Z                         nvpool/ROOT/hipster_2020.04-20200622T165833Z@2017-05-03-04:49:41
nvpool/ROOT/hipster_2017.04                                                   nvpool/ROOT/hipster_2020.04-20200622T165833Z@2017-05-31-06:51:36
nvpool/ROOT/hipster_2017.04-20170903T231101Z                                  nvpool/ROOT/hipster_2020.04-20200622T165833Z@2017-11-12-18:40:47
nvpool/ROOT/hipster_2017.10                                                   nvpool/ROOT/hipster_2020.04-20200622T165833Z@2017-12-08-11:46:57
nvpool/ROOT/hipster_2017.10-20171227T103659Z                                  nvpool/ROOT/hipster_2020.04-20200622T165833Z@2017-12-30-10:46:38
nvpool/ROOT/hipster_2017.10-20171227T103659Z-nv                               nvpool/ROOT/hipster_2020.04-20200622T165833Z@2018-01-31-23:41:11
nvpool/ROOT/hipster_2017.10-20180203T155526Z                                  nvpool/ROOT/hipster_2020.04-20200622T165833Z@2018-05-03-14:19:26
nvpool/ROOT/hipster_2018.04-20180503T141758Z                                  nvpool/ROOT/hipster_2020.04-20200622T165833Z@2018-07-24-11:26:56
nvpool/ROOT/hipster_2018.04-20180724T112647Z                                  nvpool/ROOT/hipster_2020.04-20200622T165833Z@2018-11-13-09:32:46
nvpool/ROOT/hipster_2018.10-20181113T103249Z                                  nvpool/ROOT/hipster_2020.04-20200622T165833Z@2019-02-13-01:22:00
nvpool/ROOT/hipster_2018.10-20190129T091404Z                                  nvpool/ROOT/hipster_2020.04-20200622T165833Z@2019-01-29-09:14:04
nvpool/ROOT/hipster_2018.10-20190213T012200Z                                  nvpool/ROOT/hipster_2020.04-20200622T165833Z@2019-03-22-08:56:38
nvpool/ROOT/hipster_2018.10-20190322T085637Z                                  nvpool/ROOT/hipster_2020.04-20200622T165833Z@20190910-02
nvpool/ROOT/hipster_2018.10-20191003T110320Z                                  nvpool/ROOT/hipster_2020.04-20200622T165833Z@2019-10-03-15:33:08
nvpool/ROOT/hipster_2018.10-20191003T110320Z-nvm                              nvpool/ROOT/hipster_2020.04-20200622T165833Z@2019-10-04-12:28:06
nvpool/ROOT/hipster_2018.10-20191004T122806Z                                  nvpool/ROOT/hipster_2020.04-20200622T165833Z@2019-11-15-14:33:11
nvpool/ROOT/hipster_2019.10-20191115T133333Z                                  nvpool/ROOT/hipster_2020.04-20200622T165833Z@2019-12-19-09:10:26
nvpool/ROOT/hipster_2019.10-20191219T091024Z                                  nvpool/ROOT/hipster_2020.04-20200622T165833Z@2020-01-14-15:35:53
nvpool/ROOT/hipster_2019.10-20200113T161936Z                                  nvpool/ROOT/hipster_2020.04-20200622T165833Z@2020-01-13-16:19:36
nvpool/ROOT/hipster_2019.10-20200114T153553Z                                  nvpool/ROOT/hipster_2020.04-20200622T165833Z@2020-01-27-10:15:58
nvpool/ROOT/hipster_2019.10-20200127T101549Z                                  nvpool/ROOT/hipster_2020.04-20200622T165833Z@2020-04-14-09:55:32
nvpool/ROOT/hipster_2019.10-20200414T095506Z                                  nvpool/ROOT/hipster_2020.04-20200622T165833Z@2020-04-16-05:44:41
nvpool/ROOT/hipster_2019.10-20200416T054436Z                                  nvpool/ROOT/hipster_2020.04-20200622T165833Z@2020-04-25-14:02:51
nvpool/ROOT/hipster_2019.10-20200425T140249Z                                  nvpool/ROOT/hipster_2020.04-20200622T165833Z@2020-04-27-06:29:42
nvpool/ROOT/hipster_2020.04                                                   nvpool/ROOT/hipster_2020.04-20200622T165833Z@2020-06-08-08:42:53
nvpool/ROOT/hipster_2020.04-20200608T084252Z                                  nvpool/ROOT/hipster_2020.04-20200622T165833Z@2020-06-22-16:58:33
nvpool/ROOT/hipster_2020.04-20200622T165833Z                                  -
  • So on source nvpool there are a couple of independent trees, the "hipster*" ones that layered over the years, and the "firefly*" ones with equivalent of old OpenSolaris SXCE Failsafe boot. Notably, the "history owners" like nvpool/ROOT/hipster_2020.04-20200622T165833Z and nvpool/ROOT/firefly_0215 have no origins.
    ** I did not find any particular properties that would indicate that a dataset owns snapshot history shared with other datasets, so probably the only way to know is to look at all dataset origins on a whole pool at the moment.
  • On the destination backup-adata pool the situation is partially similar to source, with two separate trees present and all old "hipster*" rootfs'es having origins in backup-adata/snapshots/nvpool/ROOT/hipster_2018.10-20190322T085637Z-bak1 (no longer present on source) probably replicated manually, and newer ones being orphans and technically each one is a tree with one node. A recent child of that common origin is present on both pools however, at nvpool/ROOT/hipster_2018.10-20190213T012200Z so we might guess about cross-datasetname incremental replications needed, or (at least manually) temporarily zfs promote these same-named datasets on source and destination and so simplify the matching of history ownerships and replications of siblings.
  • For manually named snapshots, e.g. @20191208-01 in some examples above, they were named by recursive snapshot of nvpool/ROOT so in fact it may be not easy and reliable/predictable to guess which one to use as a common snapshot to base replication from. Hopefully the snapshot names implicated in being someone's origin are better in this regard, at least if the names are pretty unique as made by beadm in this case.

More data points: Created and activated a new beadm to check what happens... and also to update the destination pool with an inheritable history :)

The new one became the owner of rootfs history from the beginning of time, and the origin of snapshots for other rootfs clones including the recently active one:

root@jimoi:/root# zfs list -d1 -tfilesystem -o name,origin -r nvpool/ROOT
NAME                                            ORIGIN
...
nvpool/ROOT/hipster_2020.04                     nvpool/ROOT/hipster_2020.04-20200809T192157Z@2020-06-08-08:42:53
nvpool/ROOT/hipster_2020.04-20200608T084252Z    nvpool/ROOT/hipster_2020.04-20200809T192157Z@2020-06-22-16:58:33
nvpool/ROOT/hipster_2020.04-20200622T165833Z    nvpool/ROOT/hipster_2020.04-20200809T192157Z@2020-08-09-19:21:58
nvpool/ROOT/hipster_2020.04-20200809T192157Z    -

root@jimoi:/root# df -k /
Filesystem                                   1K-blocks   Used Available Use% Mounted on
nvpool/ROOT/hipster_2020.04-20200622T165833Z 114922085 543340 114378745   1% /
root@jimoi:/root# zfs list -d1 -tall -r nvpool/ROOT/hipster_2020.04-20200809T192157Z
NAME                                                                               USED  AVAIL  REFER  MOUNTPOINT
nvpool/ROOT/hipster_2020.04-20200809T192157Z                                      85.4G   109G   525M  /
nvpool/ROOT/hipster_2020.04-20200809T192157Z@postsplit-01                           62K      -   266M  -
nvpool/ROOT/hipster_2020.04-20200809T192157Z@postsplit-02                           51K      -   268M  -
nvpool/ROOT/hipster_2020.04-20200809T192157Z@postsplit-03                           54K      -   268M  -
nvpool/ROOT/hipster_2020.04-20200809T192157Z@2014-04-16-15:50:45                    55K      -   268M  -
nvpool/ROOT/hipster_2020.04-20200809T192157Z@postupgrade-01                        196M      -   270M  -
nvpool/ROOT/hipster_2020.04-20200809T192157Z@1                                     663K      -   284M  -
nvpool/ROOT/hipster_2020.04-20200809T192157Z@20140418-01                           664K      -   284M  -
nvpool/ROOT/hipster_2020.04-20200809T192157Z@20140425-01                           207M      -   287M  -
nvpool/ROOT/hipster_2020.04-20200809T192157Z@postupgrade-20140803Z134336          93.3M      -   327M  -
nvpool/ROOT/hipster_2020.04-20200809T192157Z@20150106-01                          93.7M      -   327M  -
nvpool/ROOT/hipster_2020.04-20200809T192157Z@postupgrade-20150115Z201009           231M      -   328M  -
nvpool/ROOT/hipster_2020.04-20200809T192157Z@postupgrade_pkgips-20151212T193923Z   219M      -   426M  -
nvpool/ROOT/hipster_2020.04-20200809T192157Z@2015-12-17-13:37:28                   125M      -   426M  -
nvpool/ROOT/hipster_2020.04-20200809T192157Z@postupgrade_pkgips-20160118T170356Z  86.6M      -   426M  -
...
nvpool/ROOT/hipster_2020.04-20200809T192157Z@znapzend-auto-2020-07-30T18:25:39Z    344K      -   531M  -
nvpool/ROOT/hipster_2020.04-20200809T192157Z@znapzend-auto-2020-08-08T00:38:26Z    324K      -   531M  -
nvpool/ROOT/hipster_2020.04-20200809T192157Z@2020-08-09-19:21:58                   237K      -   531M  -
nvpool/ROOT/hipster_2020.04-20200809T192157Z@postupgrade_pkgips-20200809T194524Z   101K      -   526M  -

Similarly for zone roots, with the complication that they only cloned the zbe dataset and probably its children if any (the ROOT holding it remains unchanged), and the snapname/timestamp is unique to each zone root at the moment it was cloned to update:

root@jimoi:/root# zfs list -d1 -tsnapshot -r nvpool/zones/testdhcp/ROOT/zbe-36 | head
NAME                                                                   USED  AVAIL  REFER  MOUNTPOINT
nvpool/zones/testdhcp/ROOT/zbe-36@znapzend-auto-2019-02-02T09:30:00Z  24.5K      -    27K  -
nvpool/zones/testdhcp/ROOT/zbe-36@2019-02-13-01:22:21                  126K      -   835M  -
nvpool/zones/testdhcp/ROOT/zbe-36@znapzend-auto-2019-02-14T09:00:00Z      0      -   835M  -
nvpool/zones/testdhcp/ROOT/zbe-36@znapzend-auto-2019-03-21T00:00:00Z      0      -   835M  -
nvpool/zones/testdhcp/ROOT/zbe-36@2019-03-22-08:58:00                     0      -   835M  -
nvpool/zones/testdhcp/ROOT/zbe-36@20190322-01                             0      -   835M  -
nvpool/zones/testdhcp/ROOT/zbe-36@znapzend-auto-2019-04-18T05:30:00Z      0      -   835M  -
nvpool/zones/testdhcp/ROOT/zbe-36@znapzend-auto-2019-05-16T07:30:00Z      0      -   835M  -
nvpool/zones/testdhcp/ROOT/zbe-36@znapzend-auto-2019-06-13T08:00:00Z      0      -   835M  -
...
nvpool/zones/testdhcp/ROOT/zbe-36@znapzend-auto-2020-07-23T09:30:00Z      0      -   835M  -
nvpool/zones/testdhcp/ROOT/zbe-36@znapzend-auto-2020-07-23T10:00:00Z      0      -   835M  -
nvpool/zones/testdhcp/ROOT/zbe-36@znapzend-auto-2020-07-30T18:31:27Z      0      -   835M  -
nvpool/zones/testdhcp/ROOT/zbe-36@znapzend-auto-2020-08-08T00:38:11Z      0      -   835M  -
nvpool/zones/testdhcp/ROOT/zbe-36@2020-08-09-19:24:15                     0      -   835M  -

The latest (currently activated) ZBE sequentially numbered version is the origin for other clones of this zone root, except the numbers earlier removed (on source) with beadm destroy:

root@jimoi:/root# zfs get origin nvpool/zones/testdhcp/ROOT/zbe-3{0,1,2,3,4,5,6}
cannot open 'nvpool/zones/testdhcp/ROOT/zbe-30': dataset does not exist
cannot open 'nvpool/zones/testdhcp/ROOT/zbe-32': dataset does not exist
NAME                               PROPERTY  VALUE                                                  SOURCE
nvpool/zones/testdhcp/ROOT/zbe-31  origin    nvpool/zones/testdhcp/ROOT/zbe-36@2020-04-27-06:30:10  -
nvpool/zones/testdhcp/ROOT/zbe-33  origin    nvpool/zones/testdhcp/ROOT/zbe-36@2020-06-08-08:44:04  -
nvpool/zones/testdhcp/ROOT/zbe-34  origin    nvpool/zones/testdhcp/ROOT/zbe-36@2020-06-22-16:59:26  -
nvpool/zones/testdhcp/ROOT/zbe-35  origin    nvpool/zones/testdhcp/ROOT/zbe-36@2020-08-09-19:24:15  -
nvpool/zones/testdhcp/ROOT/zbe-36  origin    -                                                      -

For the setup of datasets and snapshot names elaborated above, a direct attempt to send new dataset name as an increment from the old seems to work, ZFS discovers where it needs to clone and append automatically, at least for a replication stream (which is what I want, but not necessarily what znapzend users might want if they wish to e.g. disable some child datasets from replication... not sure if that is currently possible, but is a constraint against zfs send -R mode):

root@jimoi:/root# zfs send -R -I nvpool/ROOT/hipster_2020.04-20200809T192157Z@{20190830-01,postupgrade_pkgips-20200809T194524Z} | mbuffer -m 1G | zfs recv -vue backup-adata/snapshots/nvpool/ROOT

in @  0.0 KiB/s, out @  0.0 KiB/s, 2220 KiB total, buffer   0% full
cannot open 'backup-adata/snapshots/nvpool/ROOT/hipster_2020.04-20200809T192157Z': dataset does not exist
in @ 24.9 MiB/s, out @  0.0 KiB/s, 2240 KiB total, buffer  35% full
in @  0.0 KiB/s, out @  0.0 KiB/s, 2240 KiB total, buffer 100% full
receiving incremental stream of nvpool/ROOT/hipster_2020.04-20200809T192157Z@znapzend-auto-2019-09-10T11:30:00Z
 into backup-adata/snapshots/nvpool/ROOT/hipster_2018.10-20190322T085637Z-bak1@znapzend-auto-2019-09-10T11:30:00Z
in @  0.0 KiB/s, out @  0.0 KiB/s,  122 MiB total, buffer 100% full
received 120MB stream in 115 seconds (1.04MB/sec)
in @  0.0 KiB/s, out @  0.0 KiB/s,  122 MiB total, buffer 100% full
receiving incremental stream of nvpool/ROOT/hipster_2020.04-20200809T192157Z@20190910-01 into backup-adata/snapshots/nvpool/ROOT/hipster_2018.10-20190322T085637Z-bak1@20190910-01
in @  0.0 KiB/s, out @  0.0 KiB/s,  241 MiB total, buffer 100% full
received 119MB stream in 74 seconds (1.61MB/sec)
in @  0.0 KiB/s, out @  0.0 KiB/s,  241 MiB total, buffer 100% full
receiving incremental stream of nvpool/ROOT/hipster_2020.04-20200809T192157Z@20190910-02 into backup-adata/snapshots/nvpool/ROOT/hipster_2018.10-20190322T085637Z-bak1@20190910-02
in @ 1807 KiB/s, out @  0.0 KiB/s,  257 MiB total, buffer 100% full
received 15.4MB stream in 63 seconds (251KB/sec)
...

I guess the laptop now has a long interesting night ahead...

UPDATE: Cool, it can even recognize the increments already present in destination pool, though probably part of some other clone's history currently:

in @  0.0 KiB/s, out @  0.0 KiB/s, 1556 MiB total, buffer 100% full
receiving incremental stream of nvpool/ROOT/hipster_2020.04-20200809T192157Z@znapzend-auto-2019-10-15T03:56:25Z
    into backup-adata/snapshots/nvpool/ROOT/hipster_2018.10-20191004T122806Z-bak1@znapzend-auto-2019-10-15T03:56:25Z
snap backup-adata/snapshots/nvpool/ROOT/hipster_2018.10-20191004T122806Z-bak1@znapzend-auto-2019-10-15T03:56:25Z 
    already exists; ignoring
received 0B stream in 45 seconds (0B/sec)

in @  0.0 KiB/s, out @  0.0 KiB/s, 1571 MiB total, buffer 100% full
receiving incremental stream of nvpool/ROOT/hipster_2020.04-20200809T192157Z@znapzend-auto-2019-10-19T10:00:00Z
    into backup-adata/snapshots/nvpool/ROOT/hipster_2018.10-20191004T122806Z-bak1@znapzend-auto-2019-10-19T10:00:00Z
snap backup-adata/snapshots/nvpool/ROOT/hipster_2018.10-20191004T122806Z-bak1@znapzend-auto-2019-10-19T10:00:00Z
    already exists; ignoring
received 0B stream in 43 seconds (0B/sec)

And that is a bit I hate about delays in zfs recv - seen even in a single-command mode (may be worse when everything is in separate commands, so e.g. mbuffers are not as well utilized to fill up on one side while another "thinks" when there are many single-use mbuffer processes):

received 312B stream in 62 seconds (5B/sec)
vs.
received 15.4MB stream in 63 seconds (251KB/sec)
vs.
received 394MB stream in 82 seconds (4.81MB/sec)

I think every receive takes a minute procrastinating and then a few seconds doing I/O. Haven't seen any snapshot increment today that would clock under 60s. Grrr... Looking at the consoles, I see it doing zero active I/O on source and destination pools (zpool iostat 1), then the zfs send|zfs recv pipe says "receiving incremental stream of X into Y" and there is a burst of I/O for a second, and the pipe logs how it received a few kilobytes in a lot of seconds...

UPDATE: It "sped up", now the base time taken for an increment to be received is 42 seconds or so. I remain puzzled.

So in the end the dried-up routine I've done in shell for the rootfs and zoneroots was:

  • beadm create ... to clone the filesystems,
  • update the OS,
  • beadm activate ... to zfs promote newest roots,
  • for each global rpool/ROOT/rootfs-N and local pool/zones/some/more/path/levels/zonename/ROOT/zbe-M I listed the zfs origins to find the latest ones available on destination, and the newest on source (previous active root is now the newest child of current history owner created just above), for example:
root@jimoi:/root# zfs list -d1 -tfilesystem -o name,origin -r {backup-adata/snapshots/,}nvpool/zones/omni151018/ROOT
NAME                                                        ORIGIN
backup-adata/snapshots/nvpool/zones/omni151018/ROOT         -
backup-adata/snapshots/nvpool/zones/omni151018/ROOT/zbe     backup-adata/snapshots/nvpool/zones/omni151018/ROOT/zbe-30@2017-05-03-04:49:45
backup-adata/snapshots/nvpool/zones/omni151018/ROOT/zbe-1   backup-adata/snapshots/nvpool/zones/omni151018/ROOT/zbe-30@2017-05-31-06:51:40
backup-adata/snapshots/nvpool/zones/omni151018/ROOT/zbe-13  backup-adata/snapshots/nvpool/zones/omni151018/ROOT/zbe-30@2017-12-30-10:46:40
...
backup-adata/snapshots/nvpool/zones/omni151018/ROOT/zbe-28  backup-adata/snapshots/nvpool/zones/omni151018/ROOT/zbe-30@2019-02-13-01:22:08
backup-adata/snapshots/nvpool/zones/omni151018/ROOT/zbe-29  backup-adata/snapshots/nvpool/zones/omni151018/ROOT/zbe-30@2019-01-29-09:14:13
backup-adata/snapshots/nvpool/zones/omni151018/ROOT/zbe-30  -
...
backup-adata/snapshots/nvpool/zones/omni151018/ROOT/zbe-59  -
backup-adata/snapshots/nvpool/zones/omni151018/ROOT/zbe-8   backup-adata/snapshots/nvpool/zones/omni151018/ROOT/zbe-30@2017-11-12-18:40:50
backup-adata/snapshots/nvpool/zones/omni151018/ROOT/zbe-9   backup-adata/snapshots/nvpool/zones/omni151018/ROOT/zbe-30@2017-12-08-11:47:02
nvpool/zones/omni151018/ROOT                                -
nvpool/zones/omni151018/ROOT/zbe                            nvpool/zones/omni151018/ROOT/zbe-60@2017-05-03-04:49:45
nvpool/zones/omni151018/ROOT/zbe-1                          nvpool/zones/omni151018/ROOT/zbe-60@2017-05-31-06:51:40
nvpool/zones/omni151018/ROOT/zbe-13                         nvpool/zones/omni151018/ROOT/zbe-60@2017-12-30-10:46:40
nvpool/zones/omni151018/ROOT/zbe-14                         nvpool/zones/omni151018/ROOT/zbe-60@2018-01-31-23:41:19
nvpool/zones/omni151018/ROOT/zbe-17                         nvpool/zones/omni151018/ROOT/zbe-60@2018-05-03-14:19:29
nvpool/zones/omni151018/ROOT/zbe-18                         nvpool/zones/omni151018/ROOT/zbe-60@2018-07-24-11:27:04
nvpool/zones/omni151018/ROOT/zbe-22                         nvpool/zones/omni151018/ROOT/zbe-60@2018-05-10-09:01:23
nvpool/zones/omni151018/ROOT/zbe-23                         nvpool/zones/omni151018/ROOT/zbe-60@2018-11-13-09:32:55
nvpool/zones/omni151018/ROOT/zbe-24                         nvpool/zones/omni151018/ROOT/zbe-60@2018-09-18-09:23:55
nvpool/zones/omni151018/ROOT/zbe-27                         nvpool/zones/omni151018/ROOT/zbe-60@2018-10-17-21:49:25
nvpool/zones/omni151018/ROOT/zbe-28                         nvpool/zones/omni151018/ROOT/zbe-60@2019-02-13-01:22:08
nvpool/zones/omni151018/ROOT/zbe-29                         nvpool/zones/omni151018/ROOT/zbe-60@2019-01-29-09:14:13
nvpool/zones/omni151018/ROOT/zbe-30                         nvpool/zones/omni151018/ROOT/zbe-60@2019-03-22-08:57:10
nvpool/zones/omni151018/ROOT/zbe-31                         nvpool/zones/omni151018/ROOT/zbe-60@2019-10-04-12:28:11
nvpool/zones/omni151018/ROOT/zbe-32                         nvpool/zones/omni151018/ROOT/zbe-60@2019-11-15-14:33:16
nvpool/zones/omni151018/ROOT/zbe-34                         nvpool/zones/omni151018/ROOT/zbe-60@2019-12-19-09:10:31
nvpool/zones/omni151018/ROOT/zbe-35                         nvpool/zones/omni151018/ROOT/zbe-60@2020-01-14-15:35:59
nvpool/zones/omni151018/ROOT/zbe-36                         nvpool/zones/omni151018/ROOT/zbe-60@2020-01-13-16:20:04
nvpool/zones/omni151018/ROOT/zbe-37                         nvpool/zones/omni151018/ROOT/zbe-60@2020-01-27-10:16:04
nvpool/zones/omni151018/ROOT/zbe-39                         nvpool/zones/omni151018/ROOT/zbe-60@2020-04-14-09:56:00
nvpool/zones/omni151018/ROOT/zbe-41                         nvpool/zones/omni151018/ROOT/zbe-60@2020-04-16-05:44:47
nvpool/zones/omni151018/ROOT/zbe-55                         nvpool/zones/omni151018/ROOT/zbe-60@2020-04-27-06:29:48
nvpool/zones/omni151018/ROOT/zbe-57                         nvpool/zones/omni151018/ROOT/zbe-60@2020-06-08-08:43:36
nvpool/zones/omni151018/ROOT/zbe-58                         nvpool/zones/omni151018/ROOT/zbe-60@2020-06-22-16:58:55
nvpool/zones/omni151018/ROOT/zbe-59                         nvpool/zones/omni151018/ROOT/zbe-60@2020-08-09-19:22:53
nvpool/zones/omni151018/ROOT/zbe-60                         -
nvpool/zones/omni151018/ROOT/zbe-8                          nvpool/zones/omni151018/ROOT/zbe-60@2017-11-12-18:40:50
nvpool/zones/omni151018/ROOT/zbe-9                          nvpool/zones/omni151018/ROOT/zbe-60@2017-12-08-11:47:02

Here the interesting lines are the last ones in "history of clones":
** on destination: backup-adata/snapshots/nvpool/zones/omni151018/ROOT/zbe-29 backup-adata/snapshots/nvpool/zones/omni151018/ROOT/zbe-30@2019-01-29-09:14:13
** on source: nvpool/zones/omni151018/ROOT/zbe-60@2020-08-09-19:22:53
** beware of the fewer-digit names that mix up the sorting, or play with -s creation ;)

  • I construct the CLI command like:
:: zfs send -R -I ${SRCROOTCONTAINER}/${CURRENTBE}@{$NEWEST_DST_SNAP,$NEWEST_SRC_SNAP} \
   | mbuffer | zfs recv -vue ${DSTROOTCONTAINER}

where "${SRCROOTCONTAINER}/${CURRENTBE}@${NEWEST_SRC_SNAP}" is literally the origin string copy-pasted from the listing above (nvpool/zones/omni151018/ROOT/zbe-59 => nvpool/zones/omni151018/ROOT/zbe-60@2020-08-09-19:22:53), and then edited in command line to prepend the destination's newest snapshot tag, e.g.:

:; zfs send -R -I nvpool/zones/omni151018/ROOT/zbe-60@{2019-01-29-09:14:13,2020-08-09-19:22:53} \
  | mbuffer -m 128M | zfs recv -vue backup-adata/snapshots/nvpool/zones/omni151018/ROOT
  • This works for a while, often reporting that it skips a snapshot increment already present on the destination, and that for now it appends to history of that older snapshot on the destination :
in @  0.0 KiB/s, out @  0.0 KiB/s,  372 KiB total, buffer   2% full
receiving incremental stream of nvpool/zones/omni151018/ROOT/zbe-60@2019-03-22-08:57:10 into backup-adata/snapshots/nvpool/zones/omni151018/ROOT/zbe-30@2019-03-22-08:57:10
snap backup-adata/snapshots/nvpool/zones/omni151018/ROOT/zbe-30@2019-03-22-08:57:10 already exists; ignoring
received 0B stream in 7 seconds (0B/sec)
  • In so far a couple of my ~10 zone-roots sent this way, I got a hiccup like:
...
in @  0.0 KiB/s, out @  0.0 KiB/s,  376 KiB total, buffer   2% full
receiving incremental stream of nvpool/zones/omni151018/ROOT/zbe-60@20190322-01 into backup-adata/snapshots/nvpool/zones/omni151018/ROOT/zbe-30@20190322-01
in @  0.0 KiB/s, out @  0.0 KiB/s,  376 KiB total, buffer   2% full
cannot restore to backup-adata/snapshots/nvpool/zones/omni151018/ROOT/zbe-30@20190322-01: destination already exists
cannot open 'backup-adata/snapshots/nvpool/zones/omni151018/ROOT/zbe-60': dataset does not exist

mbuffer: error: outputThread: error writing to <stdout> at offset 0x5e000: Broken pipe

summary:  378 KiByte in  2min 15.7sec - average of  2.8 KiB/s
mbuffer: warning: error during output to <stdout>: Broken pipe

but requesting a re-send starting with that snapshot name seems to work, e.g.:

:; zfs send -R -I nvpool/zones/omni151018/ROOT/zbe-60@{20190322-01,2020-08-09-19:22:53} \
   | mbuffer -m 128M | zfs recv -vue backup-adata/snapshots/nvpool/zones/omni151018/ROOT

in @  0.0 KiB/s, out @  0.0 KiB/s,  300 KiB total, buffer   0% full
cannot open 'backup-adata/snapshots/nvpool/zones/omni151018/ROOT/zbe-60': dataset does not exist
in @  0.0 KiB/s, out @  0.0 KiB/s,  324 KiB total, buffer   2% full

this probably impacts snapshot histories where same-named (but different) snapshots have been made and replicated over time. Note that the clone for "zbe-86" still was not made in the example above. Possibly I'd have to branch it off under such name an earlier snapshot on destination (common with source), will see soon... so far it is quiet for a long time...

TODO: When it is all done, check that histories of existing destination datasets remain in place and the new zoneroot did get branched off of an older snapshot. Otherwise maybe making the clone from an ancient point and receiving increments into it more explicitly is the proper way (recursively with children somehow then)?..

At least, running a dozen operations like this in parallel keeps the destination pool fairly busy. Although each zfs send|zfs recv pipe still lags for quite a while between actual writes (though... a different "while" - for zoneroots the minimum overhead seems to be about 10 sec) someone has something to say almost every second so zpool iostat 1 looks less spikey.

Clone and send did not go too well...

root@jimoi:/root# zfs clone backup-adata/snapshots/nvpool/zones/omnibld/ROOT/zbe-56@2019-01-29-09:14:11 backup-adata/snapshots/nvpool/zones/omnibld/ROOT/zbe-86
root@jimoi:/root# zfs send -R -I nvpool/zones/omnibld/ROOT/zbe-86@{2019-01-29-09:14:11,2020-08-09-19:22:41} | mbuffer -m 128M | zfs recv -vue backup-adata/snapshots/nvpool/zones/omnibld/ROOT
 
in @  0.0 KiB/s, out @  0.0 KiB/s,  304 KiB total, buffer   0% fullreceiving incremental stream of nvpool/zones/omnibld/ROOT/zbe-86@2019-02-13-01:22:06 into backup-adata/snapshots/nvpool/zones/omnibld/ROOT/zbe-86@2019-02-13-01:22:06
cannot receive incremental stream: most recent snapshot of backup-adata/snapshots/nvpool/zones/omnibld/ROOT/zbe-86 does not
match incremental source
mbuffer: error: outputThread: error writing to <stdout> at offset 0x51000: Broken pipe

summary:  324 KiByte in 30.3sec - average of 10.7 KiB/s
mbuffer: warning: error during output to <stdout>: Broken pipe

Maybe it should go on with what it did, without a named destination until the stream from zfs send -R makes it, but hop over the "offending" increments...

UPDATE: zfs promote backup-adata/snapshots/nvpool/zones/omnibld/ROOT/zbe-86 helped, it seems. At least this is chugging along again, and included the correct version of that somehow offending snapshot:

receiving incremental stream of nvpool/zones/omnibld/ROOT/zbe-86@2019-03-22-08:57:02 into backup-adata/snapshots/nvpool/zones/omnibld/ROOT/zbe-86@2019-03-22-08:57:02
in @  248 KiB/s, out @  0.0 KiB/s,  448 KiB total, buffer   0% full
snap backup-adata/snapshots/nvpool/zones/omnibld/ROOT/zbe-86@2019-03-22-08:57:02 already exists; ignoring
received 0B stream in 1 seconds (0B/sec)
in @  0.0 KiB/s, out @  0.0 KiB/s,  448 KiB total, buffer   0% full

### this did not pass earlier
receiving incremental stream of nvpool/zones/omnibld/ROOT/zbe-86@20190322-01 into backup-adata/snapshots/nvpool/zones/omnibld/ROOT/zbe-86@20190322-01
in @ 15.4 KiB/s, out @  0.0 KiB/s,  572 KiB total, buffer   1% full
in @  504 KiB/s, out @  0.0 KiB/s,  572 KiB total, buffer   1% full
received 124KB stream in 15 seconds (8.27KB/sec)

receiving incremental stream of nvpool/zones/omnibld/ROOT/zbe-86@znapzend-auto-2019-04-18T05:30:00Z into backup-adata/snapshots/nvpool/zones/omnibld/ROOT/zbe-86@znapzend-auto-2019-04-18T05:30:00Z
in @  0.0 KiB/s, out @  0.0 KiB/s,  572 KiB total, buffer   2% full
received 312B stream in 9 seconds (34B/sec)
...

Update: the long lags issue raised on illumos IRC; a viable theory is that as my source and destination pools have quite a few datasets (me being me) and hordes of snapshots (largely thanks to znapzend, and to my desire to roll back into considerable past if needed), overall on the order of 1k datasets and 100k snaps, it seems that zfs recv spends a lot of time to iterate (recursively!) and find perhaps a guid of dataset that would match or not something in the received stream; pstack at https://pastebin.com/ctU1kLse and it takes arguably too much time given that all the ZFS data is in cache and kernel, and no disk I/O has to happen...

Regarding zfs promote : seems it's required on destination.

The benefit for situations like what my backup pool found itself in is that zfs send -R -I ...| zfs recv -e pipe guesses the dataset an increment of snapshot belongs to. Increments are tracked apparently by their previous snapshot's guids. So when znapzend (unaware about how to handle promoted snapshots) made them from scratch, I've got many short history owners where the snapshots received now landed - to one or another ZBE, instead of becoming a history of the new dataset name to be made. A made (from old newest-common snap of a history owner dataset) and promoted destination dataset does seem to solve this, and the new history owner gets all incoming incremental snapshots.

One more bit of wizdom from the trenches: not always it happens so that a top-level rootfs snapshot common between source and backup pools exists same-named on both sides for child datasets (e.g. in my systems I have split-off /usr, /var and some others). They usually are there for snapshots transferred earlier with a replication stream, and may be spotty after znapzend took over.

So care has to be taken to choose a consistent set of snapshots for the subtree to pose as a new rootfs name, and then clone and promote each one individually (zfs tool does not offer recursion for those operations). And then zfs send -R ... newrootfsname@lastsnap | zfs recv -vue newrootfsname can be performed to receive the newrootfsname and its children to the backup.

Oh, did I mention it can take days with sufficiently many snapshots involved (might want to script only hopping between milestones relevant for other rootfs'es to branch off to remake them as inheriting parts of the same history and ignoring older zfs-auto-snaps), and that to avoid confusion nothing has to appear in these datasets so for the time being znapzend service better stay off, at least not touching this tree (should be easy to do with "enabled" toggle in the policy definition).

The IRC discussion of the slow zfs recv for such situation provided a few insights:

  • zfs send should not be a bottleneck (and does not seem to be, as mbuffer is usually full and waiting for receiver); it is likely looking the dataset up by name, which is relatively fast, and the guid is just a property you read. An incremental snapshot includes the guid of the parent snapshot it is a diff from, so that even if you rename a dataset on either the target or the source system, zfs recv can find the snapshot anyway.
    ** Ideas pitched to have libzfs, or zfs recv getting a multidataset stream, or kernel side, to cache/structure the guids in memory and API so that they can be queried faster than walking the ARC pages of pool metadata over and over again.
    ** Generally an in-kernel cache (and API to ask if a guid exists in a subtree under the specified dataset) would be useful for someone doing a series of independent zfs recv's (znapzend tends to, replication-package mode is not there yet for some reasons), probably with expiration properties of a cache... though to be useful, sized enough that the guids from start of a pool would need to not be pushed out as we read the end of it ;)
    ** Well a cache is one option -- but a more efficient lookup structure is another. If the guid->snapshot lookup is critical for performance, and it seems like it is, we should probably have a specific mapping table for that translation I imagine right now it is, instead, doing the regular dataset/snapshot enumeration which can be somewhat expensive and lead to a lot of random I/O (on-disk or in ARC afterwards).
  • Performance of zfs recv can be impacted by destination dataset layout, such as what I did via zfs promote:
    ** pretty sure the walking around looking for the guid is done under the tree you specify as the recv destination first, so most backup programs for big pools just deeply nest and target their dataset structure to avoid it ever waking up the (whole) tree looking for one. it sucks most when you have wide levels, and guid->snap lookup is definitely needed, at least for incrementals - not just a sanity check. but if you e.g. break off the older snapshots into their own dataset and make the new one you're recv'ing into a clone off one of them (break the snapshots up): avoid ever having more than a few thousand "things" in any level of the dataset hierarchy, keeps the recv snapshot finding fast. you do like a rename of the current name to thing-old then clone it back from the latest snap into its new name, and now the thing it searches at the first stage of recv is only 1 thing again. This trick is primarily good for backups (programs running over live dataset would be upset about the re-mounting), but also those pools are the ones which tend to house a lot more snapshots than the source.
    ** another commenter who "never noticed that problem, and we're doing tons of replication on pools with millions of snapshots", noted that "we don't use the zfs tool, only interact with libzfs directly" so that may be another hint at the code bottleneck...

At least, after cloning and promoting target datasets first, and sending a replication stream afterwards, I confirmed this cut the lag between received snapshots from about a minute to about 5 sec - much more bearable!

something to add to the documentation maybe ?

Noted in documentation with #512 and also posted my shell script that addresses part of the problem there as a contrib/ directory content.

A bit of good news: there seems to be no special handling required for catching up a cloned rootfs or zbe with child datasets that is NOT a tip to be zfs promote'd: in any case, for boot environments and zones, the original tool (e.g. beadm) responsible for original snapshot+clone seems to have made a recursive snapshot, so the snap in parent BE root and its child datasets are same-named. It wouldn't hurt to test for this in the script pedantically (or for other use-cases) so that we branch the backup from a snapshot common to the whole bunch, but seems redundant in this practical case. So letting the same contributed script pick through my rootfs backups now...

Yet another data point: on Solaris 10u10, while zfs send -R clone@snap | zfs recv does detect the "clone origin" on its own and attaches the received increments as a branch off of the existing tree, I can not explicitly send an incremental snapshot that spans the two datasets - using the two points (origin and oldest individual snapshot) seen by zfs in the earlier command - in this case it just aborts with "incremental source must be in same filesystem" message.

stale commented

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.