celestiaorg/nmt

Use leaf hashes only

Opened this issue · 3 comments

NMT is built on top of some existing data structure. It is used to prove the absence or presence of some subset of data. In the case of Celestia, it is proving shares within an extended data square.

As the square already contains the underlying data, it is an unnecessary allocation of memory to also store the entire data in the leaves. Rather, in Push the hash of the contents should be provided alongside the namespace ID. Note the hashing function for the leaves can be different for the hashing function used for the rest of the tree. Doing this will keep the tree as lightweight as possible.

Furthermore, we may look to remove the Get and GetWithProof methods.

liamsi commented

As the square already contains the underlying data

true

it is an unnecessary allocation of memory to also store the entire data in the leaves

while I agree I wanted to note as the data is passed in as a slice, it is not really that wasteful (basically a reference).

in Push the hash of the contents should be provided alongside the namespace ID

Not sure. Counter arguments: if the hashing of the leaves is left to the caller, the tree becomes more difficult to be used correctly (e.g. no domain separation between inner and lead nodes) also this is quite uncommon as a merkle tree is supposed to hash the data.

That said, you are right that currently no one uses Get or GetWithProof and hence even the refs to the orig data are unnecessary.

Not sure. Counter arguments: if the hashing of the leaves is left to the caller, the tree becomes more difficult to be used correctly (e.g. no domain separation between inner and lead nodes) also this is quite uncommon as a merkle tree is supposed to hash the data.

Yeah on second thought I think the nmt should have full control of the hashing method used