json containing illegal whitespace between terms is deemed valid
7stud opened this issue · 2 comments
The UTF-8 whitespace character EM SPACE
(U+2003, e2 80 83
in UTF-8) is an illegal character between json terms. See any vintage of the json RFC, which says:
Insignificant whitespace is allowed before or after any of the six
structural characters.
ws = *(
%x20 / ; Space
%x09 / ; Horizontal tab
%x0A / ; Line feed or New line
%x0D ) ; Carriage return
That means the only legal whitespace characters are a subset of the ascii whitespace characters.
Even though json allows UTF-8 characters, that does not mean that you can put any UTF-8 character between json terms. Putting an EM SPACE
between terms is conceptually similar to putting, say, a z
between json terms.
Here is a simple example of json with an EM SPACE
after each comma:
[1, 2, 3]
jsonlint.com reports that as valid json. The linter should report that as invalid json. Why is that important? If you are using Ruby on Rails and Rails spits out an error due to an unexpected token when parsing json with an EM SPACE
between terms, and then you check with a jsonlinter as the authority on what is legal json, and the linter says the json is legal, you might end up tearing all your hair out trying to find the bug in your Rails program. Other jsonlinters correctly report the json as invalid.
@7stud thank you. Fixed both at https://jsonlint.com and https://jsoncompare.com
Hmm, wrong, got build error.