lvmteam/lvm2

Native D-Bus daemon

Closed this issue · 8 comments

lvmdbusd is currently a Python wrapper around the lvm shell. This works, but is inelegant and inefficient.

A better approach would be to have a native D-Bus activated daemon in C. Since systems using the daemon are expected to use it for most or all operations, the daemon can perform optimizations, such as efficiently storing metadata in-memory instead of having to re-read it every time.

lvmdbusd is currently a Python wrapper around the lvm shell. This works, but is inelegant and inefficient.

Anything that forks & execs a command line and processes it's output is going to be less than ideal, but don't discount the python on performance with regards to dbus. Additionally, lvm added JSON output and the ability to fetch the entire lvm state in 1 command, which the daemon uses.

A better approach would be to have a native D-Bus activated daemon in C. Since systems using the daemon are expected to use it for most or all operations, the daemon can perform optimizations, such as efficiently storing metadata in-memory instead of having to re-read it every time.

The lvm dbus daemon already stores lvm state in memory so that is has something to compare to when it re-fetches state from lvm on a change. This is needed to provide the functionality for PropertiesChanged signal ref: https://dbus.freedesktop.org/doc/dbus-specification.html So that we can inform the client what properties have changed and what the new value is in the signal. You should find that retrieving information over dbus is faster than getting it from the command line as it's cached.

Until lvm has something resembling a C API, I don't see a C dbus service implementation being feasible.

lvmdbusd is currently a Python wrapper around the lvm shell. This works, but is inelegant and inefficient.

Anything that forks & execs a command line and processes it's output is going to be less than ideal, but don't discount the python on performance with regards to dbus. Additionally, lvm added JSON output and the ability to fetch the entire lvm state in 1 command, which the daemon uses.

Indeed, and I will be porting Qubes OS to use this.

A better approach would be to have a native D-Bus activated daemon in C. Since systems using the daemon are expected to use it for most or all operations, the daemon can perform optimizations, such as efficiently storing metadata in-memory instead of having to re-read it every time.

The lvm dbus daemon already stores lvm state in memory so that is has something to compare to when it re-fetches state from lvm on a change. This is needed to provide the functionality for PropertiesChanged signal ref: https://dbus.freedesktop.org/doc/dbus-specification.html So that we can inform the client what properties have changed and what the new value is in the signal. You should find that retrieving information over dbus is faster than getting it from the command line as it's cached.

How does it compare to manual caching? Is there an advantage to using the LVM shell instead of separate command-line invocations?

Until lvm has something resembling a C API, I don't see a C dbus service implementation being feasible.

I am not so sure. My understanding is that one reason for not having a C API is that LVM does not want to have a stable API. A C dbus service can use the internal unstable APIs.

How does it compare to manual caching? Is there an advantage to using the LVM shell instead of separate command-line invocations?

The bane of all caching is cache coherence. The lvm command line tools, when they change state will actually call a dbus method on the lvm dbus daemon to invalidate its state. However, this still fails if a block device has a lvm signature and someone does a wipefs -a on it. There is an open bug to specify using the --udev option in the service file so that the daemon always pays attention to udev events too, so that cases like this cause the daemon to correctly update it's state. By default it does so until a lvm notify event comes in, then it disables the monitoring of udev events. The lvm shell performance gains by not forking & execing the lvm command line repeatedly are negligible. I would need to run the unit tests again to quote a percentage, but I believe single digit.

A C dbus service can use the internal unstable APIs.

True, but you still need a code base that has some kind of API to leverage. Last time I tried ~2015, I was hindered by the fact the command line handling is intertwined, spanning many layers deep in the call stack. I believe some improvements have been made, perhaps @teigland can speak on this.

How does it compare to manual caching? Is there an advantage to using the LVM shell instead of separate command-line invocations?

The bane of all caching is cache coherence. The lvm command line tools, when they change state will actually call a dbus method on the lvm dbus daemon to invalidate its state. However, this still fails if a block device has a lvm signature and someone does a wipefs -a on it. There is an open bug to specify using the --udev option in the service file so that the daemon always pays attention to udev events too, so that cases like this cause the daemon to correctly update it's state. By default it does so until a lvm notify event comes in, then it disables the monitoring of udev events. The lvm shell performance gains by not forking & execing the lvm command line repeatedly are negligible. I would need to run the unit tests again to quote a percentage, but I believe single digit.

How much performance could be gained if the (C) daemon and/or LVM shell just assumed that no other instances of the LVM tools are running, and skipped re-validating the state of the system for each and every command?

This is not something that would be good to use in production, but it does provide an indicator of how much performance there is to gain.

ATM lvm2 does not support any library C API - we had tried in past - but it had too many 'dirty' corners where we have not been able to come with some good solution - so the only left API is to basically call lvm command itself.

lvm2 shell has the minor advantage in performance - saving time of 'exec/run-time linking' and shortened initialization time.
There is not much to save with caching state of system - since for lvm2 whatever happens outside of VG lock - must be revalidated when VG lock is grabbed...

And general comment - usage of dBus is the last way to optimize anything in Linux - it's more about easier usage in same cases - but usually at increased resource usage (i.e. tons on completely useless scans, endless updates of internal states which no one really cares to ever read and many many other issues...)

ATM lvm2 does not support any library C API - we had tried in past - but it had too many 'dirty' corners where we have not been able to come with some good solution - so the only left API is to basically call lvm command itself.

What are those corners?

lvm2 shell has the minor advantage in performance - saving time of 'exec/run-time linking' and shortened initialization time. There is not much to save with caching state of system - since for lvm2 whatever happens outside of VG lock - must be revalidated when VG lock is grabbed...

What if the daemon grabbed the VG lock and never let it go? That is reasonable for a system-wide daemon, though it would require

And general comment - usage of dBus is the last way to optimize anything in Linux - it's more about easier usage in same cases - but usually at increased resource usage (i.e. tons on completely useless scans, endless updates of internal states which no one really cares to ever read and many many other issues...)

Oh yeah, dbus isn’t exactly known for being optimal. One could come up with an interface that just mapped LVM2 commands onto D-Bus operations, though, without any of the inefficient niceties.

ATM lvm2 does not support any library C API - we had tried in past - but it had too many 'dirty' corners where we have not been able to come with some good solution - so the only left API is to basically call lvm command itself.

What are those corners?

For 'suspend' modes we do need to look portion of lvm2 code into memory - so it can't be swapped out and cause the system deadlock....
We can't really handle this workflow well without major redesign of lvm2 code base and there are simply no hands for such thing...

lvm2 shell has the minor advantage in performance - saving time of 'exec/run-time linking' and shortened initialization time. There is not much to save with caching state of system - since for lvm2 whatever happens outside of VG lock - must be revalidated when VG lock is grabbed...

What if the daemon grabbed the VG lock and never let it go? That is reasonable for a system-wide daemon, though it would require

lvm2 does not target such kind of operations - i.e. admin could run many different commands working with different VGs....
If there is major restriction of support device world (set of constrains) - there are more optimal ways....

lvm2 is targeted for global parallel and even clustered usage (i.e. devices are attached and used by different hosts - so LVs from 1 VG could be activated on different VG...)
So we cannot use such short-cuts...

And general comment - usage of dBus is the last way to optimize anything in Linux - it's more about easier usage in same cases - but usually at increased resource usage (i.e. tons on completely useless scans, endless updates of internal states which no one really cares to ever read and many many other issues...)

Oh yeah, dbus isn’t exactly known for being optimal. One could come up with an interface that just mapped LVM2 commands onto D-Bus operations, though, without any of the inefficient niceties.

D-Bus is inefficient by design. Current lvm2 D-Bus support just does not make it much worse then it already is... ;)
For efficiency users should stay with lvm2 commands...