ethereum/wiki

Have reference code for RLP decoding reject some invalid encodings

acoglio opened this issue · 9 comments

The reference code for RLP decoding in https://github.com/ethereum/wiki/wiki/RLP accepts the following kinds of invalid encodings:

  1. [129, x] with x < 128 -- this should be encoded as [x].
  2. [183+LL, L1, ..., LLL, ...] with 1 ≤ LL ≤ 8 and L1 = 0 -- a multi-byte length should be encoded without leading zeros.
  3. [183+LL, L1, ..., LLL, ...] with 1 ≤ LL ≤ 8 and L = L1*256LL-1 + ... + LLL < 56 -- this should be encoded as [128+L, ...].
  4. [247+LL, L1, ..., LLL, ...] with 1 ≤ LL ≤ 8 and L1 = 0 -- a multi-byte length should be encoded without leading zeros.
  5. [247+LL, L1, ..., LLL, ...] with 1 ≤ LL ≤ 8 and L = L1*256LL-1 + ... + LLL < 56 -- this should be encoded as [192+L, ...].

These kinds of invalid encodings have been discussed in various places, e.g.:

I propose to add the lines preceded by # ADDED: below to the function decode_length of the reference code for RLP decoding in the Wiki. These lines reject the invalid encodings described above.

def decode_length(input):
    length = len(input)
    if length == 0:
        raise Exception("input is null")
    prefix = ord(input[0])
    if prefix <= 0x7f:
        return (0, 1, str)
    elif prefix <= 0xb7 and length > prefix - 0x80:
        strLen = prefix - 0x80
        # ADDED:
        if strLen == 1 and ord(input[1]) <= 0x7f:
            raise Exception("single byte below 128 must be encoded as itself")
        return (1, strLen, str)
    elif prefix <= 0xbf and length > prefix - 0xb7 and length > prefix - 0xb7 + to_integer(substr(input, 1, prefix - 0xb7)):
        lenOfStrLen = prefix - 0xb7
        # ADDED:
        if input[1] == 0:
            raise Exception("multi-byte length must have no leading zero");
        strLen = to_integer(substr(input, 1, lenOfStrLen))
        # ADDED:
        if strLen < 56:
            raise Exception("length below 56 must be encoded in one byte");
        return (1 + lenOfStrLen, strLen, str)
    elif prefix <= 0xf7 and length > prefix - 0xc0:
        listLen = prefix - 0xc0;
        return (1, listLen, list)
    elif prefix <= 0xff and length > prefix - 0xf7 and length > prefix - 0xf7 + to_integer(substr(input, 1, prefix - 0xf7)):
        lenOfListLen = prefix - 0xf7
        # ADDED:
        if input[1] == 0:
            raise Exception("multi-byte length must have no leading zero");
        listLen = to_integer(substr(input, 1, lenOfListLen))
        # ADDED:
        if listLen < 56:
            raise Exception("length below 56 must be encoded in one byte");
        return (1 + lenOfListLen, listLen, list)
    else:
        raise Exception("input don't conform RLP encoding form")

We could also add a paragraph to that Wiki page, just after the code, to explain that invalid encodings with "sub-optimal" lengths must be rejected.

Some examples of the 5 kinds of invalid encodings mentioned above are at https://github.com/ethereum/tests/blob/develop/RLPTests/invalidRLPTest.json, in the "out" fields of the JSON objects with the following names:

  1. bytesShouldBeSingleByte00, bytesShouldBeSingleByte01, bytesShouldBeSingleByte7F
  2. leadingZerosInLongLengthArray1, leadingZerosInLongLengthArray2
  3. nonOptimalLongLengthArray1, nonOptimalLongLengthArray2
  4. leadingZerosInLongLengthList1, leadingZerosInLongLengthList2
  5. nonOptimalLongLengthList1, nonOptimalLongLengthList2

Sorry @acoglio this subject is a little beyond my knowledge, I posted to some dev channels and no one has given any feedback. One other option if you feel strongly is to just make the change yourself in the wiki and someone else will likely tell you later if tat was a bad idea. Sometimes that's the best way to get an opinion 😆

@ChrisChinchilla Thanks, I'll make the change. What are the mechanics for making a change to this Wiki? I see no Edit button on the RLP Wiki page. Do I 'Clone it locally', edit RLP.md, and push?

@acoglio Gah, looks like permissions have changed then, we were getting too many unhelpful changes. Wikis work differently from normal repos. If it's OK with you, I'll make your changes then and credit you in the comments.

@ChrisChinchilla That's fine with me, thanks.

Besides the additions to the code above, we could also add the following paragraph just after the code:

Note that the decode_length function rejects invalid encodings that have "non-optimal" lengths, namely (1) singleton strings whose only byte is below 128 that are encoded with a short (i.e. one-byte) length of 1 instead of as the strings themselves and (2) strings and lists with long (i.e. multi-byte) lengths with leading zeros (which must be absent) or below 56 (which should be encoded using short lengths).

Thanks, @ChrisChinchilla. By the way, the five # ADDED: comment lines in the new code were only meant to show the added lines in this Issue, but were not meant to be part of the updated code in the Wiki. Would you mind removing them, when you get a chance? (I should have mentioned that earlier, sorry for the extra work.)

Ahhh @acoglio I wasn't sure, sorry. Will remove now.