toncenter/tonweb

Incorrect varint/varuint API

Opened this issue · 0 comments

Preface

Variable-length integers in TON have surprising definition and implementing them by hand is prone to error.

TL-B schema for VarInteger/VarUInteger defines the integer via the maximum size of the integer in bytes. That is, VarUInteger 32 means "integer up to 31 byte long", or "248-bit integer". The syntax #<n means "number of bits necessary to represent numbers 0..n-1".

var_uint$_ {n:#} len:(#< n) value:(uint (len * 8))
         = VarUInteger n;
var_int$_ {n:#} len:(#< n) value:(int (len * 8)) 
        = VarInteger n;

The problem

Compare VarUInteger 7 and VarUInteger 8. Both use 3-bit length prefix to represent sizes in the ranges 0..6 bytes (48 bits) and 0..7 bytes (56 bits) respectively. Every encoding of VarUInteger {5,6,7} are strict subsets of VarUInteger 8 and a correct TL-B encoder and decoder will have to enforce the actual specified bounds in runtime. This is indeed what C++ TL-B compiler does.

Current APIs in ton-core, tonweb and tongo depart from the original definition in the TL-B schema and let user specify the bit width of the length prefix. Instead of specifying the non-inclusive upper boundary in bytes (n), the user is supposed to specify ceil(log2(n)) number of bits in the internal length prefix. Such API erases the possibility to enforce actual bounds for types such as VarUInteger 7 (see for example, the definition of StorageUsed). This means, the user may accidentally encode a larger number than permitted or accept a malformed input data.

Suggestion

Redefine the varint APIs in terms of upper byte boundary and add checks on the actual number size when writing and reading varints.

References