barryWhiteHat/roll_up

Python Test Preprocessing of Witness Arguments

jheinnic opened this issue · 2 comments

I've been looking at the preprocessing behavior of the Python test case in order to wrap my head around the interface to this library. There is some logic that, according to the documentation, is changing the endian-ness of the public keys.

As far as I've seen the term used before, "reversing the endianness" of a series of bytes requires reversing the order of bytes in the series without also reversing the order of bits within each byte. There is such a thing as 16-bit endianness and 32-bit endianness, but without such qualification, my understanding is that 8-bit bytes are the usual word size.

Big-endian order feels natural to native English speakers as we typically write our numerals left to right in decreasing order of significance. One might send "120" emails in a day in decimal big-endian, or 021 emails in decimal little-endian. And 8-bit little-endian is sometimes also known as "network byte order"

So consider a 4-byte string encoded in hexadecimal using big-endian byte order. Imagine that looks like 8c94ab3f. Security absurdities aside, lets presume this happens to have been one coordinate of a generated public key that we want to put into the witness for a roll_up proof. If the documented requirement is that this is sent in little-endian order, I would expect the required input to be 3fab948c. The content and relative order of each pair of hex characters would have been preserved, reflecting the fact that I have not changed the bit order within each byte, but the order in which each pair of characters occurs has been reversed.

Using the same methods as the Python test case, however, I wind up with the value fcd52931. Looking more closely at this value, I can see that it is a bit-by-bit reversal of the input string, not an endianness reversal. The following uses some helper routines I used to capture useful subsets of the complete statement used in the Python test suite, but contains the very same logic:

>>> hex(binaryToInt(hexToBinary("0x8c94ab3f"))) # This is documented to reverse the endianness
'0xfcd52931'
>>> hexToBinary("0x8c94ab3f"))
[1, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1]
>>> hexToBinary("0xfcd52931")
[1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 1]

A theory that the test case is actually revering the entire bit sequence, not altering its endianness is supported by the fact that, at least for this input, it is symmetrical:

>>> hex(binaryToInt(hexToBinary("0xfcd52931")))
'0x8c94ab3f'

...However, it is also possible to provide inputs where the passing the output of the first call to the input of a second call does not is asymmetric, but when that output is passed to a third call, we find that the sequence stabilizes on a symmetrical pair of reversible bit sequences...

>>> hex(binaryToInt(hexToBinary('0x2c94ab38')))
'0x7354a4d'
>>> hex(binaryToInt(hexToBinary('0x7354a4d')))
'0x5929567'
>>> hex(binaryToInt(hexToBinary('0x5929567')))
'0x7354a4d'

Looking at the bit-strings makes it easy to see why things have become so apparently scrambled:

>>> hexToBinary('0x2c94ab38')
[1, 0, 1, 1, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 1, 1, 0, 0, 1, 1, 1, 0, 0, 0]
>>> hexToBinary('0x7354a4d')
[1, 1, 1, 0, 0, 1, 1, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 1]
>>> hexToBinary('0x5929567')
[1, 0, 1, 1, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 1, 1, 0, 0, 1, 1, 1]

In the first iteration, we not only reversed the sequence, we also lost the trailing 0s. By chance, the number of affected digits was not an even multiple of either a byte or a nibble, so we transposed every remaining character by a bit shift operation before the sequence reached equilibrium.

If the goal really is to switch endianness, the easiest path I can see to doing so would be to use the bitstring import dependency already present in this projects' requirements.txt file. Assuming that public_key[j-1][0] is one of two coordinates we want to encode as a hex string using non-default endianness:

from bitstring import BitArray;

swapUtil = BitArray(public_key[j-1][0]);
swapUtil.byteswap(0)
altEndianHex = swapUtil.hex;

This shouldn't be hard to get implemented in a deterministically clear way once its understood what the contractually expected Witness input should be. I'm attaching a bit of code I extracted from the rest case and its utilities to help observe what was happening here.

Archive.zip

Thanks so much for taking the time to investigate this.

The code is quite poor quatility now. It was made to illustrate the logic that this system need. It requires a bunch of work to get it into good shape. If you are interested in working on this code I would be happy to have a call with you and we can do kind of a joint code review. To help explain what is going on. Let me know if you are interested.

Looking more closely at this value, I can see that it is a bit-by-bit reversal of the input string, not an endianness reversal.

This is correct. The comment is wrong.

Using bitstring seems like a good idea. I would be happy ot merge a PR to add them.

I think there is no issue with this encoding. I imagine that these values end up somewhere in libsnark and reversed binary string is the native representation for it. i2lebsp from https://github.com/zcash-hackworks/zcash-test-vectors/blob/master/sapling_utils.py does such a reverse