circlecell/jsonlint.com

json containing illegal whitespace between terms is deemed valid

7stud opened this issue · 2 comments

7stud commented

The UTF-8 whitespace character EM SPACE (U+2003, e2 80 83 in UTF-8) is an illegal character between json terms. See any vintage of the json RFC, which says:

   Insignificant whitespace is allowed before or after any of the six
   structural characters.

      ws = *(
              %x20 /              ; Space
              %x09 /              ; Horizontal tab
              %x0A /              ; Line feed or New line
              %x0D )              ; Carriage return

That means the only legal whitespace characters are a subset of the ascii whitespace characters.

Even though json allows UTF-8 characters, that does not mean that you can put any UTF-8 character between json terms. Putting an EM SPACE between terms is conceptually similar to putting, say, a z between json terms.

Here is a simple example of json with an EM SPACE after each comma:

[1, 2, 3]

jsonlint.com reports that as valid json. The linter should report that as invalid json. Why is that important? If you are using Ruby on Rails and Rails spits out an error due to an unexpected token when parsing json with an EM SPACE between terms, and then you check with a jsonlinter as the authority on what is legal json, and the linter says the json is legal, you might end up tearing all your hair out trying to find the bug in your Rails program. Other jsonlinters correctly report the json as invalid.

finom commented

Hmm, wrong, got build error.