Possible differences of the implementation compared to Polkadot's SCALE specification
Closed this issue · 4 comments
As a part of our efforts to rebuild the Substrate docs, I'm currently setting up a minipage about this Rust implementation of the SCALE codec. During my research I noticed a few curious things. My assumption is that the Polkadot specification of SCALE1 was used as a reference for this implementation. This might not be true, but certainly there must exist some sort of ground truth for all the different implementations of SCALE somewhere.
Finding: I noticed that the compact encoding of integers
As an example we can look at the first relevant integer and its compact encoding:
let x : u32 = u32::pow(2, 30);
println!("{:02x?}", AsCompact(x).encode());
The output of this is [03, 00, 00, 00, 40]
. Here the upper six bits of the first byte are equal to 0
and 0x03
results from the mode bitflag 11
appended as the least-signifcant bits. We have
However, if I interpret the definitions of the specification correctly, then it differs from this implementation in that the byte used for encoding the length (=the first byte) is counted as well. This can be seen by looking at the definition of
That
Coming back to the example above, the desired output according to the specification would be [07, 00, 00, 00, 40]
, where 0x07
is the compact encoding of
Now, this seems like a minor difference, and, in fact, I think the implementation here is better3, but that's beside the point. I'm just wondering if this could cause any harm, especially if other implementations of SCALE are encoding exactly as defined in the specification. I'm currently not knowledgeable enough to gauge the implications, so I wanted to share what I've found.
Footnotes
-
https://spec.polkadot.network/id-cryptography-encoding#sect-scale-codec ↩
-
There exists a similiar issue with vector encodings, but since it is a bit more complicated, i will refrain from elaborating here until some sort of relevancy is established. ↩
-
My understanding is that the available 6 bits are used to encode
m - 4
(instead of e.g.m
) to ensure they are utilized as efficiently as possible. Not counting the first byte allows us to compact encode integers up to2^{(63+4)8 - 1} = 2^{536} - 1
. ↩
The spec isn't correct, no need to include the one byte that is always present in the encoding.
The spec isn't correct, no need to include the one byte that is always present in the encoding.
Thanks for your reply. This part I do understand. I just wondered if this can result in problems when this implementation is used in conjunction with others. Or did this Rust implementation itself serve as some sort of reference implementation for SCALE?
The Rust implementation is the reference. The spec didn't exist before, at least not the one you mentioned.
Thank you!