storaged-project/udisks

RFE: Add volume based filesystem objects (btrfs, bcachefs)

Opened this issue · 5 comments

jelly commented

UDisks supports Btrfs through a btrfs filesystem object in Cockpit we noticed
this isn't enough abstraction to easily work and represent btrfs (and maybe
other volume based fs as well such as bcachefs).

In Cockpit we decided to represent btrfs similiar to LVM from an UI perspective

-> btrfs subvolume
    -> btrfs volume (the filesystem)
        -> btrfs device (backing storage)
	    -> block device

To represent the usage per device in a multi device setup we currently parse the output of btrfs filesystem show $uuid, as nothing in UDisks exists to represent a "btrfs device".
To show the mount points per subvolume (outside of btrfs) we parse findmnt --btrfs --json to detect these. This might be a very Cockpit specific issue.
In general we need to keep a lot of global state for keeping a list of blocks per volume, etc.

This seems like a good base for UDisks objects, hopefully re-usable with bcachefs:

design proposal

org.freedesktop.UDisks2.Filesystem.BTRFSSubvolume

Represents a btrfs subvolume from btrfs subvolume list $mountpoint

Methods:

  • Delete
  • Mount
  • Unmount

Properties

  • Volume
  • path
  • level
  • id
  • gen
  • MountPoints

org.freedesktop.UDisks2.Filesystem.BTRFSVolume

Represents a btrfs "filesystem" btrfs filesystem show

Methods:

  • AddDevice(block)
  • RemoveDevice(block)
  • SetLabel(name)
  • Balance() - either spread out metadata or convert a multidevice volume to raid1 (requires jobs). To be done for bcachefs
  • CreateSubvolume()
  • RemoveSubvolume()
  • CreateSnapshot()
  • Repair()
  • Resize()
  • Replace()? btrfs replace - replacing a failed device. Works different in bcachefs it seems. Need more investigation

Properties:

  • Label
  • UUID
  • Used (used size)
  • Size - total size
  • MissingDevices - missing devices from a multi-device config
  • Configuration - RAID1, RAID2 - tricky because btrfs supports different data and metadata configurations.

Multi device RAID configuration example:

$ btrfs device usage /
Device size:             9.09TiB
Device slack:            3.50KiB
Data,RAID10/4:           4.60TiB
Data,RAID10/2:          18.00GiB
Metadata,RAID10/4:      20.00GiB
System,RAID10/4:        16.00MiB
Unallocated:             4.46TiB

Missing devices is for multi device setups see below:

$ btrfs filesystem show
      Label: 'fedora-test'  uuid: cece4dd8-6168-4c88-a4a8-f7c51ed4f82b
        Total devices 3 FS bytes used 2.08GiB
        devid    1 size 11.92GiB used 3.56GiB path /dev/vda5
        devid    2 size 0 used 0 path /dev/sda MISSING
        devid    3 size 512.00MiB used 0.00B path /dev/sdc

org.freedesktop.UDisks2.Filesystem.BTRFSDevice

A device belonging to a "volume" as can be seen in btrfs filesystem show

Methods:

Properties:

  • Volume
  • Size
  • Used
  • Path or link to block device?
  • Stats (btrfs device stats, bcachefs equivalent unknown)
$ btrfs filesystem show
      Label: 'fedora-test'  uuid: cece4dd8-6168-4c88-a4a8-f7c51ed4f82b
        Total devices 3 FS bytes used 2.08GiB
        devid    1 size 11.92GiB used 3.56GiB path /dev/vda5 <--------

All of this would be a ton of work, and careful design to validate the concepts work with bcachefs and btrfs. For btrfs it would be ideal to use libbtrfsutil with libblockdev if it could add support for getting information about btrfs devices (missing etc., usage), stats and everything else without parsing cli output.

  • For bcachefs-tools, I have submitted an issue for providing a library to interact with.
  • For libbtrfsutil, a list of things which are lacking should be collected.

Note that this issue is a very rough draft, there are a lot of open btrfs issues and a multi device pull request I haven't had time to go through yet. (and overall I will have limited time to dedicate until after January)

Thanks for this detailed design proposal! Cc: @cmurf as I don't feel qualified enough for btrfs topology.

You may want to opt for virtual objects (i.e. on the same level as block objects and drive objects), as org.freedesktop.UDisks2.Filesystem is always bound to a specific block object. Multidisk volumes comes in mind, this is somewhat similar in concept to MDRaid objects or LVM logical volumes.

So perhaps the concept of btrfs volume shoud be modelled as a org.freedesktop.UDisks2.BTRFSVolumeObject with a single org.freedesktop.UDisks2.BTRFSVolume interface attached to it. The object may then assume multiple block objects and would still represent a single filesystem UUID. Just an idea, I might be wrong.

Then I see an issue with instantiation (1:N mapping) - org.freedesktop.UDisks2.Filesystem.BTRFSSubvolume. You may have only a single instance of a D-Bus interface attached on a single D-Bus object. In case a volume provides multiple subvolumes, where do you intend to attach such interfaces?

And for org.freedesktop.UDisks2.Filesystem.BTRFSDevice, is this supposed to act as a PV in LVM terminology?

So if I understand correctly:

  • org.freedesktop.UDisks2.Filesystem.BTRFSDevice should be attached to UDisksBlockObject alongside with the usual org.freedesktop.UDisks2.Filesystem interface
  • org.freedesktop.UDisks2.Filesystem.BTRFSVolume should be a separate UDisksModuleObject, linking multiple object paths providing the org.freedesktop.UDisks2.Filesystem.BTRFSDevice interface
  • then org.freedesktop.UDisks2.Filesystem.BTRFSSubvolume would probably need to be another class of UDisksModuleObject linking to a single btrfs volume object. Would be nice to have multiple instances of the org.freedesktop.UDisks2.Filesystem.BTRFSSubvolume interface attached to a single btrfs volume object, that's however not possible with D-Bus AFAIK.

FYI, my old attempt to enhance device identification was #838. It might still happen one day, though it's a large, intrusive change.

Also, in case this turns out to an actual implementation and UDisksModuleObjects are used, uevent handling should be done the intended way through the UDisksModuleObjectIface.process_uevent method. I.e. avoid making global updates like in the lvm2 udisks module, it brings lots of issues and race conditions.

All of this would be a ton of work, and careful design to validate the concepts work with bcachefs and btrfs.

Forget about bcachefs now. The btrfs object model is a completely separate thing, not affecting the core daemon interfaces. While the resulting object model for bcachefs might be vastly similar, it would need a separate implementation and a new UDisks module anyway. Let's keep things simple for now.

See also #802

In Cockpit we decided to represent btrfs similiar to LVM from an UI perspective

We are doing this in blivet and while it makes a lot of sense to replicate the LVM "structure" it also brings some issues especially for users that use btrfs as a simple filesystem only: they suddenly see a "btrfs volume" and have no idea what that means. We had some bug reports for Anaconda from people that expect simply reformatting btrfs to ext4 which we don't support because with this LVM-like representation the btrfs volume needs to be removed first (like running vgremove before reformatting the PVs). I am not saying you shouldn't do it this way in Cockpit, just that you should expect some issues with representing btrfs as LVM.

The biggest challenge with btrfs would be getting the subvolume information -- btrfs needs to be mounted to gather subvolume information and we are definitely not doing this automatically in udisks (we've been there, it was a terrible idea) so the proposed BTRFSSubvolume interface won't be created in most cases.

Worth adding that the standard org.freedesktop.UDisks2.Filesystem and org.freedesktop.UDisks2.Block interfaces will need to remain working as it is now, i.e. without any particular knowledge of filesystem structure. The added functionality should be added as a module (as it's going to be quite expensive I/O-wise) that needs to be explicitly activated first and would only work as an addition on top of existing interfaces.