jthornber/thin-provisioning-tools

RFE: Offline deduplication

Opened this issue · 9 comments

There are cases where it would be quite useful to be able to compact a thin pool by deduplicating identical blocks while the system is offline.

Could that be used to make a thin_dedup tool?

Just for reference, @tasket's wyng backup might be an interesting project to take a glance at (not equivalent but maybe some overlap):

https://github.com/tasket/wyng-backup

B

I think Demi is looking for a tool to deduplicate thin volumes in-place. From comments I've read in Linux discussion (and here?) I gathered that this would not be on the thinp roadmap.

OTOH, it seems like a narrowly-targeted form of dedup could be approximated for two target volumes by scanning for differences, snapshotting one volume, then updating it with the mapped differences (and finally replacing the snapshotted original with the snapshot).

FWIW, Wyng can facilitate this as part of a restore from an archive (using a sparse write mode to update an existing volume, it will skip over chunks that match). But that means performing a backup first.

I think Demi is looking for a tool to deduplicate thin volumes in-place.

That’s correct. My goal is to be able to reclaim shared space on a Qubes OS system.

A particular use case for Qubes: when backing up and then restoring thin LVs that were snapshots (e.g. of cloned QubesOS VMs), using most methods, one usually ends up with much more space used up after the restore than before, because while the originally pool had much more sharing of blocks, after all of the LVs are restored to a new thin pool, no blocks are being shared.

B

A particular use case for Qubes: when backing up and then restoring thin LVs that were snapshots (e.g. of cloned QubesOS VMs), using most methods, one usually ends up with much more space used up after the restore than before, because while the originally pool had much more sharing of blocks, after all of the LVs are restored to a new thin pool, no blocks are being shared.

B

This can actually be disastrous, as it can make backups impossible to restore. Deduplication during restore is necessary to prevent this problem.