LSON: Lucid Serialized Object Notation

Introduction
1. Key Differences from JSON
2. Special Values
Example LSON
Whitespace
Comments
Strings
1. Escape Sequences
2. String Concatenation
Words
1. Word Concatenation
Arrays
Dictionaries
Structures

Introduction

LSON is a concise data representation that has the simplicity and expressiveness of JSON, but differs in two primary areas:

It’s intended to be concise and friendly to humans as well as computers,
It does not aim to mirror JavaScript.

LSON is a superset of JSON: Any legal JSON is legal LSON.

LSON expresses data using five primitives: words, strings, arrays, dictionaries, and structures. It has no inherently special values like true, false, null, or numbers, and instead uses generic words to express values that may have additional meaning and semantics to encoders beyond their string value.

Key Differences from JSON

LSON supports comments.
Commas (and semicolons) are treated as whitespace. Put them anywhere you want, or nowhere.
String quoting is optional for words (strings without whitespace).
All possible special (word) values are handled seamlessly (e.g. NaN, infinity, undefined, maybe, 0xfffe, #ff8800).
LSON supports templated objects (structures).

Example LSON

Following are some example LSON snippets to illustrate various

// Comments are C-style: double slash to end of line, or enclosed with `/*` and `*/`.

/* This is an example using slash-star delimeters. */

{
    glossary: {
        title: 'example glossary'  // There are six legal string-delimeter pairs.
        "Gloss Div": {
            title: S
            "Gloss List": {
                "Gloss Entry": {
                    /* Strings may be unquoted as long as they contain no whitespace. */
                    ID: SGML

                    SortAs:  SGML
                    Acronym: SGML
                    "Gloss Term": "Standard Generalized Markup Language"

                    // Unquoted strings may contain whitespace if escaped.
                    Abbrev: ISO\ 8879:1986

                    "Gloss Def": {
                        para: "A meta-markup language, used to create markup languages "
                            + "such as DocBook."

                        // Note that commas and semicolons are considered whitespace.
                        "Gloss SeeAlso": [ GML, XML ];
                        "Gloss See": markup;
                    }
                }
            }
        }
    }
}

// An Example Menu Description
{
    menu: {
        id:    file;
        value: file;
        popup: {
            menuitem <value, onclick>: [       // Structured objects
                < New,   CreateNewDoc() >
                < Open,  OpenDoc()      >
                < Close, CloseDoc()     >
            ]
        };
    }
}

{
    widget: {
        debug: on
        "debug:Level"=  1.0      // The literal string value "1.0"
        "debug:Weight": Infinity // Converts to floating-point +infinity if understood, else string
        "debug:Prefix": null     // Converts to null value if understood, otherwise string
        "debug:Mask":   0xffe0   // Converts to hex number value if understood, else string

        window: {
            title:  "Sample Konfabulator Widget"
            name:   main_window
            width:  500  // Converts to number value if understood
            height: 500
        }
        image: {
            src:       Images/Sun.png
            name:      sun1
            hOffset:   250
            vOffset:   250
            alignment: center
        }
        text: {
            data:      Click\ Here
            size:      36
            style:     bold
            name:      text1
            hOffset:   250
            vOffset:   100
            alignment: center
            onMouseUp: "sun1.opacity = (sun1.opacity / 100) * 90;"
        }
    }
}

Whitespace

LSON whitespace includes all standard Unicode whitespace characters, as well as commas and semicolons:

Unicode	Escape	Description
U+0009	\t	Tab
U+000a	\n	Newline, or line feed
U+000b	\u000b	Vertical tab
U+000c	\f	Form feed
U+000d	\r	Carriage return
U+0020	\u0020	Standard space character
U+002c	\u002c	Comma
U+003b	\u003b	Semicolon
U+0085	\u0085	Next line
U+00a0	\u00a0	No-break space
U+1680	\u1680	Ogham space mark
U+2000	\u2000	En quad
U+2001	\u2001	Em quad (mutton quad)
U+2002	\u2002	En space (nut)
U+2003	\u2003	Em space (mutton)
U+2004	\u2004	Three-per-em-space (thick space)
U+2005	\u2005	Four-per-em-space (mid space)
U+2006	\u2006	Six-per-em-space
U+2007	\u2007	Figure space
U+2008	\u2008	Punctuation space
U+2009	\u2009	Thin space
U+200a	\u200a	Hair space
U+2028	\u2028	Line separator
U+2029	\u2029	Paragraph separator
U+202f	\u202f	Narrow no-break space
U+205f	\u205f	Medium mathematical space
U+3000	\u3000	Ideographic space

Comments

// Single line comments run from double forward slashes to end of line.

/*  Slash-star comments: this is probably the best form for block
    comments. */

Strings

Strings are delimited with any of the following character pairs:

Quotes	Character Codes
"string"	U+0022 U+0022 (Quotation Mark)
'string'	U+0027 U+0027 (Apostrophe)
«string»	U+00ab U+00bb ({Left,Right}-Pointing Double Angle Quotation Mark)
‘string’	U+2018 U+2019 ({Left,Right} Single Quotation Mark)
“string”	U+201c U+201d ({Left,Right} Double Quotation Mark)

Escape Sequences

Strings may contain the following escape sequences:

Sequence	Description
`\0`	Null byte
`\n`	new line
`\r`	carriage return
`\t`	horizontal tab
`\u####`	Unicode character with four hexadecimal digits
`\U######`	Unicode character with six hexadecimal digits
`\<any>`	Yields that character unchanged, such as `\'` or `\\`

String Concatenation

In order to support human-readable long strings, the + operator may be used to construct concatenations. For example:

{
    strBlock: "Knock knock.\n"
            + "Who's there?\n"
            + "Bug in your state machine.\n"
            + "Who's there?\n"
}

Words

Words are unquoted strings. For example,

node: {
    id:       1223-02
    class:    sphere
    weight:   112.23e-6
    previous: null
    next:     0xff128bc5
}

It it important to note that this feature is more useful than just as a shorthand for string values: it also provides an excellent mechanism for conveying arbitrary special values.

JSON defines several special values: true, false, null and numbers. Numbers are a subset of legal JavaScript (the “J” in JSON) representations. They lack, for example, numbers of the form ".12", where JSON requires a leading zero. In addition, the IEEE special values NaN and Infinity are unsupported. Other JavaScript values, such as 0x77 and undefined are also lacking.

The elegance of JSON, however, has given rise to its overwhelming success as a data interchange format for all kinds of situations. In this sense, it's as useful for Python or C++ as it is for JavaScript. Given this, however, what to do about Python's None, CSS's #23ec98, C++'s 0xfffe or Scala's Any? The temptation for those who wish to expand on JSON is to formalize these special values, usually starting with the introduction of additional JavaScript values.

LSON takes a different approach. Instead of adding additional value types, LSON handles all of them as simple bare words. This approach provides implicit support for any and all special value types where it makes sense – the encoders and decoders decide. If a decoder does not understand a special value (for example, a C++ parser that encounters the word #ff7e22), the word is simply interpreted as the string value "#ff7e22". A decoder that understands CSS color values, however, might parse this string as a color, with value rgb(255,126,34). Or it might not. It's really not the job of the serialization protocol to determine interpretation. Note, however, what happens on re-serialization: the C++ encoder writes the value back out as the word value #ff7e22, and the CSS encoder writes it back out as the word value #ff7e22 as well. If the value is transformed, then it's written out as the application deems proper. It's not up to the serialization format to dictate usage.

This provides a simple, stable mechanism for the interchange of data across many different types of encoders and decoders, and additionally provides for a way to convey domain-specific data values.

Word Concatenation

The concatenation operator always promotes words to strings, to produce a string-valued result. For example, the LSON red + green + blue would yield the string value "redgreenblue".

Arrays

Arrays encode ordered lists of items. They have the following properties:

They begin with a left square bracket ([, U+005b), followed by zero or more values, and terminated with a right square bracket (], U+005d).
Array values may be strings, arrays, or dictionaries.
Array values are separated by whitespace.
Arrays are contiguous. That is, there is no way in LSON to indicate an undefined element. For example, the LSON value [ a, b,,, e ] yields the array [ "a", "b", "e" ]. In this example, a, b and e are considered unquoted strings, and commas and spaces are considred whitespace. If sparse arrays are desired for a particular encoding, it is recommended that dictionaries be used with numeric key values. Encoders should follow the same convention, encoding sparse arrays as dictionaries.

Dictionaries

Dictionaries (referred to as objects in JSON) are sets of key-value pairs. Keys are string values, and hence may be either quoted or unquoted. Dictionaries have the following properties:

They begin with a left curly bracket ({, U+007b), followed by zero or more key-value pairs, followed by a right curly bracket (}, U+007d).

Structures

A structure is really just a shorthand for expressing dictionaries, without repeating key names. The following fragment:

{
    someStruct <key1 key2 key3>: [
        < thing1 false  3 >
        < thing2 false 13 >
        < thing3 true  37 >
    ]
}

is a concise way to express the following:

{
    someStruct: [
        {   key1:thing1 key2:false key3:3  }
        {   key1:thing2 key2:false key3:13 }
        {   key1:thing3 key2:true  key3:37 }
    ]
}

If more values are given for a row than were specified in the structure template, the additional values will be ignored. If fewer values are given, the missing values will be undefined. For example,

{
    collection <value isPositive>: [
        < -1.5 false >
        <  0         >
        <  2.4 true  red null >
    ]
}

would yield the same result as this LSON:

{
    collection: [
        { value: -1.5, isPositive: false }
        { value: 0 }
        { value: 2.4, isPositive: true }
    ]
}

As a side note, this is not necessarily the best or most efficient way to express tabular data. That would be more like this:

{
    fields: [ first_name last_name ID ]
    rows: [
        [ Ariel   Astro    48844757 ]
        [ Blue    Blastar  23cc418e ]
        [ Castor  Cantrod  b12b4f89 ]
    ]
}

or just this (where row 0 is special and holds the column names):

[
    [ first_name  last_name  ID       ]
    [ Ariel       Astro      48844757 ]
    [ Blue        Blastar    23cc418e ]
    [ Castor      Cantrod    b12b4f89 ]
]

CHollasch/LSON