ZFS Support

Question

ZFS Support

fabianfreyer opened this issue 6 years ago · 3 comments

fabianfreyer commented 6 years ago

Probably the first backing file system we want to support.

Spec

State

All calldown datasets shall be under a single base dataset, here denoted by <base>

<base>/storage - extract fetched layers here
<base>/storage/empty@extracted - the empty base dataset
<base>/storage/<layer_hash>@extracted - extracted dataset including all sub-datasets
<base>/runtime - clone images here from <base>/storage and start jails on them
<base>/runtime/<container_id> - root filesystem of a Jail

Operations

Extract

Extracting an image would basically be something along the lines of the following pseudocode:

let lower_snapshot = "<base>/storage/empty@extracted";
for layer in image {
    if !exists("<base>/storage/<layer_hash>@extracted") {
        let layer_dataset = lower_snapshot.clone_into("<base>/storage/<layer_hash>");
        layer.extract_over("<base>/storage/<layer_hash>");
        layer_dataset.snapshot("extracted");
    }
    let current_dataset = "<base>/storage/<layer_hash>@extracted";
}

Start a Jail

// 1. Determine the storage "image" to use as the root filesystem.
let layer_hash = runtime_config.topmost_layer;
let basefs = "<base>/storage/<layer_hash>@extracted";
// 2. Clone it into a new runtime root filesystem
basefs.clone_into("<base>/runtime/<container_id>");
// 3. Set up other mounts
// 4. Start Jail

Commit a Jail

After the Jail is stopped, committing runtime state creates a new storage layer with the current changes.

// 1. Get the runtime path and calculate the hash of the new layer
let runtime_rootfs = "<base>/runtime/<container_id>";
let hash = calculate_layer_hash(runtime_rootfs);
// 2. Snapshot the current runtime state 
let snap = runtime_rootfs.snapshot("extracted");
let storage_layer = snap.clone_into("<base>/storage/<hash>");
// 3a) Promote the cloned storage layer.
//     At the moment the dependency chain is
//         [base layer] -> [runtime]@extracted -> [new layer].
//     After promotion, this is reversed:
//         [base layer] -> [new layer]@extracted -> [runtime].
storage_layer.promote();
storage_layer.snapshot("extracted");
// 3b) If we aren't going to reuse the runtime (e.g. to build another layer), delete it:
runtime_rootfs.destroy();
// 4. Update container config to add layer <hash>

Package

This would be going through all changes between a layer and the layer immediately below it with zfs diff <base>/storage/<base_layer>@extracted <base>/storage/<layer>@extracted or similar, and collect the following in a tarball:

whiteout list containing deleted files
changed / added files

Answer 1 · 2018-07-05T19:56:56.000Z

Implementation notes

The existing Rust libzfs bindings are zfsonlinux-specific. I doubt there will be any FreeBSD support, but I've opened whamcloud/rust-libzfs#63 about it.
We can generate bindings with bindgen for libzfs_core. However, these just implement a few transactions like clone, snapshot, delete_snapshots etc. They don't have any form of dataset enumeration or property inspection/manipulation.
For all other operations, we would have to shell out to zfs and zpool. That kind of sucks.
However, we can use ZFS Channel Programs to implement the missing functionality as libzfs_core implements lzc_channel_program.
We need to figure out some way to parse and generate nvlists. One option seems to be the libnv crate.

Answer 2 · 2018-07-15T21:21:49.000Z

Have you looked at https://github.com/jmesmon/rust-libzfs for libzfs_core bindings? It may need to be adopted but it seems to provide a reasonable starting point.

Answer 3 · 2018-07-19T14:08:30.000Z

@dsheets yes, I've looked at it. I agree it seems like a reasonable starting point, but, it doesn't seem to be maintained at the moment, and doesn't come with a license. Additionally I'm not really sure I'd want to end up with my own libzfs bindings in-tree. Also, as the libzfs crate on crates.io is https://github.com/whamcloud/rust-libzfs, I think it would probably be the best if that would end up supporting FreeBSD :)