hoytech/quadrable

few questions

Nuhvi opened this issue · 2 comments

Nuhvi commented

Fascinating project, really loved the README.
However I am not very familiar with C++ so I thought asking these questions is faster than digging into the code base, and I hope it is useful for others as a FAQ.

  1. Can LMDB be swapped with other databases like LevelDB, RocksDB, or IndexedDB? or is there a performance or other reasons to not support such abstraction?
  2. If the answer for (1) is no, can Quadrable be used as a lightweight DB in browsers or mobile native apps? any guides for how to do so?
  3. Since the merkle tree is a binary tree, I can imagine integer keys trees will have very deep leafs, does that demand O(n) of hashes (where n is the depth) or is there an efficient way to compress that? maybe it is in the README and I missed it.
  4. Can you expand on why is Radix tree more complex than binary trees? And what are the optimisations used to approach the same advantages of a more shallow tree?

Thank you very much.

Hi, thanks! No worries, I'm happy to answer questions.

  1. In theory, yes. I recently did some abstraction work to support "MemStore", which allows nodes to be stored ephemerally in memory, instead of LMDB. This mostly contained the LMDB-specific code to a few wrapper functions. To support another backend store we'd probably also need abstractions for lmdb::txn and lmdb::dbi. As you mention, for performance reasons whatever backend is chosen should have good read performance since most quadrable operations perform many small reads to traverse the merkle tree.
  2. I haven't tried this yet, but I believe it should be possible to compile quadrable with Emscriptem. This should generate a WASM package that can be loaded by browsers (you'd want some wrapper functions too I think). If this is done prior to any backend-abstraction work, then it would only be usable in the "MemStore" configuration. But this may actually be OK, depending on what you want the browser to do. For strfry (my nostr relay), I'm thinking that clients that already load events on each page load will be able to save a ton of bandwidth by loading these events into a MemStore and using it to avoiding repeated downloads.
  3. Yes, good observation! Binary trees of course get deeper than trees with a higher branching factor. Quadrable has a special encoding for integers that results in shallower trees than you would get if you used (for example) big endian uint64, essentially by compressing the keys while preserving their lexical ordering. So the amount of hashing will be much less than 64 or (this would be terrible) 256. However, yes, the number of hashes required to update/delete an element is on the order of log2(N) where N is the number of elems in the DB. Furthermore, the compression adds a few extra bits of overhead. There are a few points that help mitigate this however:
    • If you're doing multiple write operations at a time, then a lot of these hashes can be amortised over one update, since they only need to be done once per update batch.
    • Encoded proofs and sync messages do not need to transmit witnesses for levels where there are no divergent prefixes. This means that the extra compression overhead described above (and/or any bias/sparseness in the key-space) does not increase proof sizes.
  4. I should probably try to implement quadrable as a radix tree so I can actually justify the claim of complexity in the README! Off the top of my head, I think the proof encoding would be trickier to implement because you'd need to provide N different witnesses per level, and indicate which position in the node they correspond to. But I'm certain it could be done, and might have some advantages in terms of storage, depending on the backing store used. However, trees with larger branching factors will have larger proof sizes. For example, consider two perfectly balanced trees with 4 elements, one with branching factor 2 and one with BF 4. The BF 2 tree has two levels and therefore requires 2 witnesses (plus the element itself). The BF 4 tree has only one level but you'll need 3 witnesses (plus the element itself). You could encode the witnesses inside a BF=4 level to use a tree-like hashing system, but then you've degenerated back to a binary tree! If you're going to go that route, it might be simpler (for proof encoding purposes) to just use a pure binary tree but cluster together related nodes in storage as an optimisation.
Nuhvi commented

@hoytech Thank you very much for your answer.

You are right, after thinking about the size of the proof, I thought this can only work with something like Blake3 and Bao, but then I am just offloading the binary tree implementation to someone smarter than myself. Which is definitely a win. But I don't think either supports further hashing by reusing old trees and diffing. Maybe worth asking for that feature. I was considering using Blake3 for hashing nodes and Bao for streaming media files anyways.

Although I suspect Blake3 won't be fit to do optimizations like empty nodes in Quadrable, but maybe it already does, I didn't grok it yet.

Thanks again.