paritytech/smoldot

How to decode storage proof into key-value pairs and encode them to storage proof?

xlc opened this issue ยท 11 comments

xlc commented

Use case: I want to mock the validation data from relaychain by decoding the trie proof, modify the value, and generate new proof and rebuild the block.

I am trying to find the right API to use in https://github.com/paritytech/smoldot/tree/main/src/trie but I can only find how to verify proof by particularly decoding it, how to create trie and get state root. But not decode proof into key-value pairs and encode the key-value pairs into storage proof.

Does smoldot have this API available? Otherwise I may need to use Substrate to do that, which comes with lot more other issues.

There's no way to do that at the moment. The functions are really tailored to the light client use case, and the light client never encodes a proof.

I would like to add some way to get the list of all elements of a proof, though. Right now, we iterate through the storage proof every time a runtime call tries to access a value, which has a bad O complexity.

xlc commented

This is what I have figured. Let's see if smoldot implements this first or cumulus fixes the wasm compile issue first...

I am happy to implement the encode proof part myself in chopsticks as long as I can reuse helpers in smoldot so that I don't have to reimplement everything.

Encoding a proof should be relatively easy by combining the trie_structure module and proof_node_codec::encode. I'm just not sure where I would put this proof encoding function.

However, proof_node_codec::encode only supports non-compact proofs. I'm honestly lost as to which protocols use a compact proof and which use a non-compact proof.

One interesting thing to note is that when making a runtime call against a compact proof, it could in theory be more efficient to just leave the proof as it is and re-decode it every time. This is because a compact proof is basically a tree.
In other words, if the proof contains for example 1000 entries but we only need 2, then it is possible to access only these 2 entries in O(log n) time.

However, we should only do this if the proof is correct, and verifying that the proof is correct is O(n) anyway, so the best solution is simply to verify the entire proof and keep a list of where in the proof each storage value is found.

After #3021, all that remains to do is:

  • Add some getters to DecodedTrieProof to get the list of entries in the proof (I'm not doing it now to avoid merge conflicts).
  • Add the proof encoding code. As I've mentioned above, it's quite easy to write if you know how proofs are encoded, as all the primitives are already there. Similarly, I'm not doing it right now to avoid conflicts with #3021

The encoding is actually more complicated than expected, because we need to provide the hashes of nodes that are in the trie but not in the proof. This makes an API kind of complicated to design.

I've implemented encoding proofs in #3033, however if you modify a value the proof will be invalid, as the hashes don't match anymore. I intentionally don't check the hashes in #3033, because normally when you generate a proof you do so by reading an already existing trie, and thus checking the hashes is normally unnecessary.

What I can do is add a ProofBuilder::adjust_hashes function that makes the proof correct. However, the hash of the root of the proof will obviously no longer match the hash in the block header, so you'd have to tweak whatever code is using the proof in order for the proof to be accepted even when it doesn't match the expected hash.

xlc commented

As long as the proof is valid, that should be enough for my use case. I can modify the state_root on relaychain in the test client to match the generated proof.

After all the PRs that I have opened are merged, it should be possible to do what you want to do.

I'm not providing any straight-forward utility at the moment because it seems very niche to me, but the idea is:

  • Use proof_decode::decode_and_verify_proof to decode the proof you want to modify.
  • Create a proof builder with proof_encode::ProofBuilder::new().
  • Use DecodedTrieProof::iter_ordered to iterate over the elements of the decoded proof.
    • If you find the element that has the key that you want to modify, then use proof_node_codec::decode to decode the ProofEntry::node_entry, modify the Decoded::storage_value to contain StorageValue::Unhashed, then use proof_node_codec::encode_to_vec to turn the thing back into a Vec<u8>. Then call ProofBuilder::set_node_value with the key, with the newly-encoded node value, and with None.
      • You could also in principle set StorageValue::Hashed and provide an unhashed node value when calling ProofBuilder::set_node_value. You can just pass a dummy value in the StorageValue::Hashed, as calling make_coherent later will automatically calculate it. The only difference between StorageValue::Unhashed and StorageValue::Hashed is that it will result in two different encodings and two different hashes, but for you it doesn't matter since you don't care about this.
    • For any other element, just directly call ProofBuilder::set_node_value with the key, the ProofEntry::node_value, and the ProofEntry::unhashed_storage_value.
  • (optional) Call ProofBuilder::missing_node_values and make sure that it returns an empty iterator. If any node value is missing, the proof will be invalid. Assuming that you did everything as explained, then it should always be empty.
  • Call ProofBuilder::make_coherent in order to update the hashes of everything and make the proof valid.
  • Call ProofBuilder::trie_root_hash to get the new root hash, if necessary.
  • Call ProofBuilder::build_to_vec to get the new proof.

Closing this issue, since that's now been done.