/zish

A data serialization format

Primary LanguageANTLRMIT No AttributionMIT-0

Zish

A data serialization format, designed to be an improvement on JSON. It adds timestamp, bytes and decimal types, as well as multi-line strings.

  • File extension .zish

  • MIME type application/x.zish

Zish Format

Here’s an example of Zish:

/* This is a comment */

/* Curly brackets delimit a map */
{
  "title": "A Hero of Our Time",  /* A key / value pair of strings */
  "description": null,
  "key": 'a3NoaGdybA==',  /* Single quotes delimit base64 encoded binary */
  "number_of_novellas": 5,
  "price": 7.99,  /* A decimal number */
  "read_date": 2017-07-16T14:05:00Z,
  "tags": [  /* Square brackets delimit a list */
    "russian",
    "novel",
    "19th century",
  ],
  "would_recommend": true,
}

Zish is a valid stream of Unicode code points encoded in UTF-8. The formal specification of Zish is an ANTLR grammar. Zish has the following data types:

Timestamp

Follows the RFC3339 format. Eg. 2017-08-09T10:40:09Z, 2017-08-09T10:40:09.037Z.

String

Strings are sequences of Unicode characters and are delimited by a double-quote character ", and can contain line breaks. If you need to write a string that contains a " then escape it with a backslash by doing \". There are other escapes allowed in strings. Here’s the full list:

Escape Description ASCII Hex Value

\a

Alert (beep sound)

07

\b

Backspace

08

\t

Horizontal tab

09

\n

Linefeed

0A

\v

Virtical tab

0B

\f

Formfeed

0C

\r

Carriage return

0D

\"

Double quotation mark

22

\\

Backslash

5C

\ followed by either a Carriage Return or Linefeed, or Carriage Return then Linefeed.

A backslash before a newline sequence is a way to remove a newline.

\u followed by four hex digits.

A unicode code point.

\U followed by eight hex digits.

A unicode code point.

Bytes

Binary data is delimited by the single-quote character, and represented by a base64 encoding as specified by RFC3548. Eg. 'a3NoaGdybA=='.

Integer

Integers can’t begin with a +, and integers with more than one digit can’t begin with a zero. Examples of valid integers are:

  • 0

  • -0

  • 389

  • -589

Boolean

The boolean data type can only have values true or false.

Null

The null type can only have the value null.

Decimal

A decimal is a representation of a real number in base 10. The exponent starts with an e or E. Examples are:

  • 0.993

  • 1.78e-1

Decimals can also have the special values NaN and Infinity (optionally prefixed with + or -).

List

A list is an ordered sequence of values, where the same value may occur more than once. Lists begin with a [ and end with a ] and the values are separated by ,. An example is:

[
  56,
  "pod",
  0,
]

Trailing commas are optional. An element of a list can be any Zish type including a list or map.

Map

A map is an unordered collection of key / value pairs. Duplicate keys aren’t allowed (for keys that are strings, the test for uniqueness is done without any normalization of the strings). Maps start with a { and end with a }. The pairs are separated by a , and the key is separated from the value with a :. Trailing commas are optional. Keys can by any type except for list, map or null, and values can be of any type. An example of a map is:

{
  "hello": 90,
  true: "larch",
  5: [
    null,
  ],
}

Comments

Comments begin with /* and end with */.

Comments are treated as whitespace rather than values, so they’re ignored by the parser and not passed through to the application.

In XML, comments are passed through to the application, which is thought to lead to an abuse of comments because it’s unclear whether they’re part of the content or not. JSON avoids this problem by not allowing comments. Zish steers a middle path here by allowing comments, but ignoring them at the parsing stage.

Comparison with JSON

To represent real numbers, JSON uses binary floating point numbers, but Zish uses decimal floating point numbers. Zish also has the following data types that JSON doesn’t have:

  • Timestamp

  • Bytes (a sequence of bytes)

Trailing commas in lists and maps are allowed in Zish, but they aren’t in JSON.

JSON has an 'object' type whereas Zish has a 'map' type. They both represent an unordered collection of name / value pairs, but they have two differences:

  • In JSON the 'name' part of the name / value pair can only be a string, but in Zish the 'name' part can be any Zish value.

  • In Zish, duplicate names aren’t allowed, but in JSON they are.

End of line (EOL) character sequences seem to be the source of problems in data serialization formats. One problem is that different operating systems have different conventions for what combination of characters constitutes an EOL. Unix based systems use LF, but Windows uses CR+LF. So if, for example, a file is created on a Debian machine and then opened on a Windows machine, all the text runs together without any line breaks.

JSON gets round this by saying that within strings, literal line breaks aren’t allowed, and you have to use an escaped line break \n instead.

Zish takes the view that Unicode has solved the EOL problem. Since Zish is a sequence of Unicode characters, it follows that Zish should respect the Unicode definition of EOLs (ie. LF, CR, CR+LF and others). So regardless of the operating system, Zish is first and foremost a Unicode sequence.

This allows multi-line strings to be written more clearly in Zish.

Comparison with Amazon Ion

Zish is influenced by the text representation of Amazon Ion, but there are several differences between them:

  • Ion doesn’t have a map type, instead it has a struct type which allows duplicate keys.

  • Ion has data types such as ‘symbol’, s-expressions, and ‘keyword’ which Zish doesn’t have.

  • There are three text types in Ion, but Zish just has one.

  • There are two binary data types in Ion, but Zish just has one.

  • Ion has a binary as well as text representation.

Comparison with the edn format

Zish is close in spirit to edn but again there are differences:

  • Edn is extensible, ie. it has a mechanism for user defined types.

  • Edn has types such as ‘character’, ‘symbol’ and ‘vector’ which Zish doesn’t have.

Implementations

If you’re working on an implementation of Zish, raise an issue on GitHub and we’ll add a link. It doesn’t need to be a complete implementation, a work-in-progress is fine.