jborg/attic

ZFS snapshot ingestion support

Rudd-O opened this issue · 0 comments

Hello! I have a specific need: I need to feed ZFS snapshots to attic (both full and incremental ZFS snapshots) from a replication stream, and have attic figure out what to store based on the content of the snapshots. Later on, I should be able to call upon attic to spit back those snapshots to me, so that I may apply them on the restore side.

The structure of a replication stream is explained here: http://open-zfs.org/wiki/Documentation/ZfsSend. For the purpose of understanding how snapshots work, it's useful to think of a snapshot as a series or a log of data modification orders (for the receiving side) to write a specific set of objects (the case of the full replication stream), and to partially rewrite those objects if they were present before (the case of the incremental replication stream). This log of orders can be naturally mergeable on the receiving side just by applying them in the right order.

I considered the naive implementation of just feeding the snapshots to attic and storing that, but doing so would eliminate my ability to do something like attic prune. My goal is to let attic know what has changed so that attic can reconstruct either the original snapshots or keep track of the data, not simply store the big files illegibly, such that I can recover space by deleting old backups without having to redo the base full backup. Given the state that attic keeps on what's been written previously, it should be possible for attic to use its diffing algorithm to figure out what later orders modify previously-stored orders in what way.

I think it should be possible, and I wonder what you think. The reason I'm contemplating this is quite simple: it's much faster to interpret a ZFS incremental send than it is to diff the modified files, especially when the modified files are gigabytes in size. My math says we're talking about 100X performance improvements for incremental backups on large data sizes.

Funding may be available depending on how long the project may run.