cyberphone/json-canonicalization

Support for uint64

matthiasgeihs opened this issue ยท 18 comments

ieeeF64, err := strconv.ParseFloat(value, 64)

This method does not work for uint64 because float64 precision is insufficient.

Suggested fix, insert before line 236:

// Is it an integer?
if _, err := strconv.ParseUint(value, 10, 64); err == nil {
	return value
}

@matthiasgeihs This is by design because JCS (RFC 8785) requires data to confirm to the JSON subset specified by JavaScript (aka I-JSON). No IETF standards based on JSON does (AFAIK...) go outside of this limit. Numbers outside of float64 must thus be put in quotes.

Thank you for the reply and the clarification.

Would it make sense to check whether precision was lost in translation and throw an error in that case?

Also note that there exists an unofficial rust implementation, serde_jcs, which des not seem to conform with the standard as it encodes uint64 without losing precision.

Hi @matthiasgeihs

Would it make sense to check whether precision was lost in translation and throw an error in that case?

I don't know how to do that for parsing unless you write a parser from scratch. One idea behind JCS was to not require such measures. There are also a bunch of edge cases that would have to be taken in consideration if strict compatibility is to be preserved:
https://datatracker.ietf.org/doc/html/rfc8785#appendix-B

Precision issues and data rewrites are things that a JCS user must deal with.

I'm not particularly fluent in Go but maybe there are ways to control serialization that makes dealing with int64 simple? In my Java implementation I defined an int53 type which flags values that do not fit.

I wasn't aware of serde_jcs, maybe I should give them a call? ๐Ÿ˜

I'm not particularly fluent in Go but maybe there are ways to control serialization that makes dealing with int64 simple? In my Java implementation I defined an int53 type which flags values that do not fit.

Could you point me to that int53 implementation? I couldn't immediately find it and am unsure if I understand how exactly that works and how you use it. Besides that, for our project I believe the correct way to do is, as you suggested, using a string representation for uint64.

I wasn't aware of serde_jcs, maybe I should give them a call? ๐Ÿ˜

Hehe, you don't have to. But I created an issue (l1h3r/serde_jcs#3) at their repo in any case.

Precision issues and data rewrites are things that a JCS user must deal with.

I was about to open an issue related to this, but with this comment, it sounds like I shouldn't.

The issue I see is:

>>> from org.webpki.json.Canonicalize import canonicalize
>>> canonicalize(2**60)
b'1152921504606847000'
>>> 2**60
1152921504606846976

Which IMO should throw an error. Expecting the user to provide a wrapper around JCS to do a inp == out check seems less than ideal. JCS is for use in secure contexts and the defaults should be secure. Allowing the input to be slightly different, but still verify correctly is dangerous. Often programmers will not realize that they should reparse/use the output from the canonicalized form that matches the signature, and not doing so could result in miss behavior as the used data may not match the signed data.

IMO, canonicalize should default to secure behavior, and via optional arguments allow for more generous serializations.

It doesn't look too hard to ensure that convert2Es6Format output is equal to it's input, and raise an error otherwise. This isn't truly ideal, as it allows some integers that are "out of range" to pass through, but would help ensure that expected behavior happens.

Hi Guys,
I must admit that my knowledge of Python is very limited. Anyway, if you convert data to float and then compare the result with the original and throw an exception they differ, wouldn't that solve the problem, assuming that compatibility with RFC 8785 is still wanted?

For int53 overflows it would help to use a more strict JSON encoder. For example with https://github.com/ijl/orjson you get:

import orjson

orjson.dumps(2**53, option=orjson.OPT_STRICT_INTEGER)

Traceback (most recent call last):
  File "...example.py", line 3, in <module>
    orjson.dumps(2**53, option=orjson.OPT_STRICT_INTEGER)
TypeError: Integer exceeds 53-bit range

Not sure about handling floats.

I will do a short test, today or tomorrow.

Just had a look at https://datatracker.ietf.org/doc/html/rfc8785#appendix-B
I guess for canonical round-trip support in python it would need a custom json.JSONEncoder and json.JSONDecoder implementation.

@titusz I have taken a new look at the problem and my conclusion is (how strange it may sound), that it indeed works as it should.

The reason is simply that there are no integers in JSON. As long as the (potentially rounded) value fits in a 64-bit double all is good. This is of course not perfect but this is what you get with RFC 8785 (and JavaScript). Numbers that needs exact representation and do not fit in a 64-bit double, must use String-based encoding schemes. If both sides do not adhere to the same rules, signatures will not validate which is more of a nuisance than a security problem.

2**600 worked as expected since it throwed an exception.

In case you build an entire JSON tool-chain from scratch you can deal with this problem in a better way:
https://github.com/cyberphone/openkeystore/blob/6c840f980bc098fba0aef9cb73dff5a1ba4f55be/library/src/org/webpki/json/JSONObjectWriter.java#L176

The python implementation does not seem to throw an exception for 2**600

>>> canonicalize(2**600)
b'4.149515568880993e+180'

I think generally it would be good practice to throw an exception if a python int/float is NOT serialized into a rfc8785 conformant representation or cannot be deserialized into the same value provided for canonicalization (full round-trip support)

What do you think?

Pardon me, it should have been 2**6000 which doesn't fit in a 64-bit double.

What you mention is a possibility but it is not compliant with the the RFC: https://datatracker.ietf.org/doc/html/rfc8785#section-3.2.2

The Python canonicalizer is not state of the art because that requires a combined parser and canonicalizer like my Java-based system which (optionally) does what you suggest:
https://github.com/cyberphone/openkeystore/blob/411f55ab9166337daa72f33d0b0ee497b29edb44/library/src/org/webpki/json/JSONParser.java#L165

However, for a community solution it is hard to justify the costs given the limited benefits. A JSON -based "schema" must anyway deal with ugly things like date objects.

Adding an integer specific test would though be very simple. Should I add this?

FWIW, I'm currently occupied with a CBOR implementation that has no such issues since CBOR doesn't mix floats, ints and big ints, as well as offering deterministic serialization eliminating all problems of the kind we discuss here:
https://test.webpki.org/csf-lab/home
I'm hooked :)

Ok, I see it is tricky. Yes an integer specific test would be welcome.

Off-Topic: I was also looking into CBOR lately and I liked what I saw. The specification explicitly includes a section about requirements for deterministic encoding... :)

@titusz Done!

Thanx, it was a good idea and best of all, it was simple :)

Thanks for the quick response! I'll look at adding test cases to make sure that this is working.

@jmgurney It surely worked but it broke the RFC so I had to revoke it :(

b6721b0

The problem is in JSON or more specifically in the I-JSON subset that JavaScript natively supports.
If you need 64-bit integers, using String-encoded integers is the only interoperable solution regardless if you sign the data or not. AFAIK, no IETF standard goes outside of I-JSON except for JSON itself :)

Can you explain how/why it broke the RFC? My reading of https://datatracker.ietf.org/doc/html/rfc7493#section-2.2 doesn't say that raising an exception for out of range values is not allowed.

JSON signing is exactly:

For applications that require the exact interchange of numbers with
greater magnitude or precision

and raising an exception to note that this is happening, allowing the author to know to use a string encoding is a good thing IMO.

@jmgurney Indeed. For RGC 8785 flagging out-of-range Numbers would be quite useful.