masaccio/compact-json

JSON object names need to be strings

Closed this issue · 7 comments

First off: thanks for the very useful tool! I am really happy to have found it, it fills a gap in the standard library that I think many people have experienced.

The issue I encountered: in a JSON object, names can only be strings (see 2.2 in the RFC). So if you do:

from compact_json import Formatter
Formatter().serialize({100: 'foobar'})

you should get either {"100": "foo bar"} (this is what the standard json library does), or perhaps raise an exception (this is what the cjson library apparently does). But with compact-json, this gives {100: "foo bar"}, which is invalid JSON. Trying to decode it with

import json
json.loads('{100: "foo bar"}')

raises a json.decoder.JSONDecodeError: "Expecting property name enclosed in double quotes: line 1 column 2 (char 1)".

I think raising an exception is the clearest option here, since silently stringifying the key could lead to unexpected behavior. An optional stringify_names argument could be nice though.

Side note:, I think the json library is actually not handling this case correctly: json.dumps({100: "foo", "100": "bar"} gives {"100": "foo", "100": "bar"} which is invalid JSON since names should be unique.

Found some discussion related to json module here
Personally I'd suggest force converting every non-string the key to string and gives a warning in such conversion.
If you already run into the problem you may try to use hjson which supports unquoted key for loading (it still gives you a dict with string as key)

despite the fact that the documents goes The names within an object SHOULD be unique, {"100": "foo", "100": "bar"} is technically not invalid as there is basically no way to detect this with a context-free grammar. It does result in information loss though.

I've added a silent conversion of non-string keys to strings which mirrors the behaviour of the standard package. I was on the fence about whether to issue a runtime warning, but I decided that I'd replicate the silent behaviour of the standard package to ensure that the JSON is valid.

I agree @ZeroRin that the behaviour regarding unique keys is broken in the standard package and also in compact-json.

>>> json.dumps({int(100): "foo", float(100): "bar"})
'{"100": "bar"}'
>>> json.dumps({int(100): "foo", str(100): "bar"})
'{"100": "foo", "100": "bar"}'

If the intent is well-formed JSON (surely it is!) then I do think that unique keys plus a runtime warning is right here.

Note that {int(100): "foo", float(100): "bar"} itself gives you a python dict {100: "bar"}, the difference in those two lines is irrelavent to the json package.

Actually on reflection, I've implemented this:

https://github.com/masaccio/compact-json#runtime-warnings

That is a good solution, thanks for the fast response and implementation!

I don't think it's a good idea to skip duplicated keys, that means the serialize action itself becomes information lossy. Write down everything and warn users that loading the json text again might cause information loss seems a better option to me. (for example, users may manually edit the output file to fix it if it's a one-time job, and don't need to revisit the code or the source data)

If you insists to output json with unique key, I‘d suggest overwriting instead of skipping during duplication, that would be more consistent to python dict behaviour, and we'd have json.loads(json.dumps(obj))==json.loads(formatter.serialize(obj))

Fair challenge @ZeroRin. Switching to overwrite is better.