cyberphone/json-canonicalization

Dealing with "unusual" integers

cyberphone opened this issue · 4 comments

Although a bit surprising, the IEEE-754 double precision format can represent a few integers with higher precision than normal. An example is 2**68 which internally is represented as

0 10001000011 0000000000000000000000000000000000000000000000000000

which is exactly 295147905179352825856.

However using the number serialization specified by JCS you rather get 295147905179352830000, effectively slashing 5 digits of precision.

This is "by design" because the assumption is that few (if any) serializers bother looking for these edge cases for the simple reason that it would only add complexity and no real value. if you need big integers you should use a big integer data type.

For parsing, it is assumed that proper rounding is used which makes the internal representation as exact as it can be, including for the 2**68 example.

jmgnc commented

The big issue is that this requires other languages to always parse numbers as 64-bit floating point numbers so that they can properly round these large values.

In the case of python, it will parse the less precise number, and if you don't round trip the number through float, you'll end up w/ the wrong number. It also means that printing the number is more complicated, as a simple '%.0f' will not work.

Also, it looks like the last power of two that is precisely printed is 254, with 255 being rounded. Despite both 54 and 55 having the same number of significant figures, and both being less than the 56 bits in the mantissa.

There a two issues here. JCS is intended to work with JSON data having the JSON Number type limited to the range supported by IEEE 754, anything else would be a complete revision. See #2

Excessive digits of precision is compliant with JCS, albeit currently poorly described.

If systems internally support higher precision (or range) is not a problem.

The way numbers are printed (serialized) in JCS MUST follow the method specified by EMCA Script version 6 which required like 2000 lines of code for Java. The python solution wasn't too difficult.

The special numbers you are mentioning do not get any special handling so their true precision is not used in JCS. If such numbers are used in the application is another thing which JCS (fortunately) do not have to worry about. Note: the JSON serializer and the canonicalizer in many cases are separate applications. In python I had to clone the serializer to cope with sorting and "float".

Summary: From a JCS point of view 295147905179352825856 and 295147905179352830000 are equivalent since constrained by IEEE 754 double precision, they are bit-wise equivalent as well. As explained the in JCS spec you can easily use higher precision and range but not through JSON Number formatting. This is the de-facto way for current IETF standards defining JSON objects.

It was surprising realizing that certain integers can be represented with higher precision than others but it does not affect JCS which doesn't look for these values.