celestiaorg/nmt

The VerifyInclusion() function does not check proof completeness

Closed this issue ยท 4 comments

Problem

This issue was identified while working on #128
The VerifyInclusion method does not check the completeness of the NMT proofs. It mostly tries to mimic a normal Merkle inclusion proof without consideration of namespaces. However, based on how it is being used in Celestia-app, the completeness check seems necessary.

Usage of VerifyInclusion in Celestia-app:
The VerifyInclusion is invoked by the func (sp ShareProof) VerifyProof() bool which aims to verify the inclusion proofs of a set of shares with identical namespace ID to their corresponding rows roots of the extended data square. As the verifyCompleteness paramteter defaults to false in the VerifyInclusion, the completeness of the supplied shares in the VerifyProof() is ignored.

This issue is created to clarify whether this is an intended behaviour or not? and to document the rational in the code and in the NMT specification.

liamsi commented

I think the motivation for this was that VerifyInclusion could also be used to verify a sub-set of a namespace (say a continuous same-namespace range, which is not the whole namespace; as the whole namespace is supposed to be verified via VerifyNamespace instead). I can see that this is not clear and can easily be misused.

@evan-forbes @Wondertan @renaynay: Are there any other usages of VerifyInclusion? Specifically, does the namespace-subset inclusion verification even make sense at all for any use-case? I think one idea that I thought about back then was e.g. to that the header of a rollup but not the full rollup's data can be checked for inclusion.

I see, thanks for the clarification, it makes sense now. ๐Ÿ™

However, the implementation of VerifyInclusion may still cause confusion for those without additional context about the usage of nmt in Celstia.
When I was extracting the normal index-based Merkle range proof and verification logics from the nmt library to include in the nmt spec, it initially looks like that VerifyInclusion() and ProveRange() were a pair of functions, where the output of ProveRange() could be verified by VerifyInclusion. This is how they are used in the tests (e.g., https://github.com/celestiaorg/nmt/blob/master/proof_test.go#L195-L199). However, this is not the case as VerifyInclusion can only verify some of the proofs generated by ProveRange, specifically those generated for a range of leaves with identical NID. In the nmt spec, I explained the normal index-based range proof and verification based on the current implementation (#162). However, I suggest implementing a generic function (if we think it is going to be useful) that can verify all types of proofs generated by the ProveRange function (instead of the current application-specific one i.e., VerifyInclusion()).

[Suggestion (not urgent)] Separately, it may be beneficial to refactor the code so that the leaves passed to VerifyInclusion are already namespaced by the caller (rather than adding namespaces inside the VerifyInclusion function i.e., https://github.com/celestiaorg/nmt/blob/master/proof.go#L308-L312). Adding namespaces looks like a helper function for the caller scope and is not directly relevant to the nmt library.

Alternatively, if we find out that there is no use case for the namespace subset inclusion, then we could also proceed as follows (in favour of minimizing the exposed APIs):
1- Refactor the celestia-app code so that namespaces are added to the leaves/shares prior to nmt proof verification e.g., in here
2. Use the VerifyNamespace() for the proof verification
3. Remove VerifyInclusion from the nmt.

to confirm what @liamsi said, VerifyInclusion was in fact for some arbitrary set of shares that share a namespace, but it makes no guarantee that the namespace is complete

we are currently using VerifyInclusion for verifying celestia tx inclusion, or proving a specific set of shares that are already determined. For example, in the qgb, or for a rollup with using a committee based fork choice rule (shared sequencer, centralized sequencer), where completeness isn't really relevant.