spacejam/sled

Feature request: constant-time way to monitor tree length

Alexis211 opened this issue · 0 comments

Use Case:

We are using Sled as a storage backend for a distributed database system used in Garage (a S3-compatible object store). We'd like to export some metrics to Prometheus to be able to monitor the state of the database. Some of the metrics we are the most interested in are the size of some particular Sled trees (some of which are queues of events waiting to be processed). Currently we call .len() on those trees but it's very costly as soon as the trees become big enough. It would be nice if we could monitor the size of these trees using only constant-time operations. In other words, we need a fast and accurate way to track the evolution of a tree's size over time.

Proposed Change:

I see two solutions to implement this:

  1. Change Sled's data format so that tree nodes know how many items are stored in each subtree, so that .len() can simply be implemented by getting this value on the root node. This option is maybe the one that requires the most changes to Sled code, but also the most satisfying overall.
  2. Add some guarantees to the Events received when watching a tree prefix, so that by counting +1 for inserts, -1 for removals, and +0 for updates, we are sure to have an up-to-date value for the tree size (currently I believe for inserts, we have no way to distinguish whether they are an update of an existing key or the insertion of a new key). This is probably an easier step to get to this feature, however it still means that when I initialize my counters I need to traverse the full tree to get its size, meaning that the initialization time of my app might be several minutes on large systems (which I don't want if I can avoid).

Who Benefits From The Change(s)?

Everyone who has a reason to be continuously monitoring a tree's size. This is particularly relevant to people using Sled trees as queues of events to be processed, as watching queue sizes is crucial to surveiling a system's state.

Alternative Approaches

Manually counting value changes at each insert/removal, which is tedious and error-prone.