Experiment with a thin provisioning metadata format that doesn't use reference counting space maps, but instead does garbage collection. Instead of having a separate GC thread (stop the world is obviously out of the question), we push the GC process forward as part of block allocation.
-
Metadata must use much less space that thinp1.
- store ranges
- compress mapping nodes, sacrifice cpu. There will be a live mapping cache in front of the metadata, so performance isn't so critical.
-
More resilience; in thinp1, if a node high up in a btree gets damaged it can be difficult to repair.
-
Live recovery rather than offline thin_repair.
-
Support 4k block size. Just to confound making metadata take up less space. This will greatly reduce write amplification.
-
Isolate thin transactions from each other. In thinp1 there is only a global transaction. As you get more active thins you are going to naturally get more commits triggered by REQ_FLUSH/FUA.
-
Support short lived snapshots that don't have a permanent performance hit on the origin.
-
Integrate with blk_archive to make it easy to migrate volumes out of the live pool.
-
Support online defragmentation of both metadata and data.
-
Shrink data volumes.
-
Multiple data volumes?