Add hash commit for serialized JSON as part of amino encoding
Closed this issue ยท 9 comments
Currently the amino format for encoding data is as follows:
[3]byte{type} | []byte{protoEncodedData}
We propose adding an optional encoding format (to be decided at codec creation) that will encode data as follows:
[3]byte{type} | [32]byte{sha256(JSON)} | []byte{protoEncodedData}
Where sha256(JSON)
is the sha256 hash of the minified JSON representation of the protoEncodedData
.
If this method was implemented and used in the store of Tendermint based chain, then full nodes could serve JSON representations to clients that include light client proofs which could be verified w/o the need to deserialize the amino encoded data.
Interesting! It should probably hash the sorted minified JSON blob, right? Also, are there any alternatives that don't require additional overhead? Serialization is already pretty slow as-is.
32 bytes per state entry is a very large number. Since state tree size optimizations haven't been prioritized (and won't for the foreseeable future AFAIK), I think this is something that needs alot of consideration.
The big question I have is that can't the clients needs be met by using Rust amino to convert an amino blob to a json blob. When I mentioned this to @jackzampolin, there was concern about the wasm code size. Then the Rust code can be compiled into WASM, and it should be able to be compiled with Rust's size optimizations and using https://github.com/rustwasm/wasm-snip to remove the remainder of unused runtime. I think that this should meet the size requirements of clients, but I don't have direct experience with this. There is also a rustwasm book reference for reducing the wasm blob size, which could perhaps imply some structural changes to the rust code that can help reduce code size. https://rustwasm.github.io/docs/book/reference/code-size.html
I also don't think the wasm blob being large should be a huge problem, since it can be loaded once and cached.
The only alternative we've thought of it is storing schema descriptions for the data in the store and under the typ3 key.
@alexanderbez dixit:
Also, are there any alternatives that don't require additional overhead? Serialization is already pretty slow as-is.
That is my concern as well. Benchmarks may help assessing the impact.
Here is the way I think of the trade offs.
We have 3 goals.
- State machine correctness and state machine developer ergonomics
- Performance
- Client developer ergonomics.
Right now we have a decent story for 1 and terrible 2 and 3.
this solution further sacrifices 2 for big improvements in 3.
There are alternative solutions involving aligning more closely with protobuf that improve both 2 and 3.
The idea to use wasm should be further explored before either trying to build implementations in other languages or modify (and break) the existing amino encoding. If possible, I would rather compile the go-code to wasm (instead of the Rust implementation which isn't complete):
https://github.com/golang/go/wiki/WebAssembly
The proposed alternative to this is start looking at specific responses from the store that we can handle with protobuf. I think the best starting point would be a protobuf that can handle the account query.
I think the way to move forward here is to make amino fully protobuf compatible (even the parts that use prefix bytes). The discussion is happening here: #267 and that milestone gives an overview on the work that needs to happen: https://github.com/tendermint/go-amino/milestone/1 (might be not complete yet)
I'd suggest to close this for now. Feel free to re-open when needed!