groupon/cson-parser

Formal definition of CSON

hildjj opened this issue ยท 23 comments

Moving bevry/cson#38 over to this project.

JSON suffered from being too tied to the JavaScript programming language early on. I suggest a document describing the format being parsed, so that other interoperable implementations can be built.

The key here is "interoperable". I want to write CSON parsers in C, Python, etc. that don't have the assumptions of ECMAscript (particularly with respect to duplicate keys, strings, and numbers) baked in.

Thanks! This sounds like a great idea. My biggest concern would be us breaking existing CSON files out there (that rely on all the dirty hacks CoffeeScript allows).

It's better to break things earlier than later. I would suggest starting by declaring what you think the format is, then dealing with the edge cases as they are reported. It's not going to get easier as CoffeeScript evolves.

Yeah, without a spec, CSON parser libraries written for other languages are extremely unlikely to accept the same set of inputs and produce the same outputs. A file written for one will break when parsed by another. (I've played around with CoffeeScript a little, and heck if I can tell you how the compiler will parse a given input.) You'll have Markdown all over again.

Just a thought, how about the spec being written in literate coffeescript with executable test cases?

If we add a spec, I would think more along the lines of a grammar, e.g. a PEG. That way we could also implement cson-parser in terms of that spec. And PEG should be reasonably portable so that it's easy to consume/port to other languages.

๐Ÿ‘

The gold standard for a spec is, of course, the JSON spec (http://www.json.org/), lest anyone think a spec must be a long, stuffy document. All a spec has to do is communicate the language to a human implementer, and be unambiguous. A PEG works as long as it is sufficiently human-readable.

The JSON spec format works for a simple, straight-forward data format like JSON. I doubt it will still be nice and understandable when it meets the monstrosity that is CoffeeScript syntax. ;) I'm not 100% convinced that implying CSON is a viable data interchange format is doing any good. Especially since CSON supports operations that are pretty tightly coupled to JavaScript floating point semantics etc.. That's not an argument against properly spec'ing how it looks like - just against adding wording to the docs suggesting that it's a good idea to use it across stacks instead of JSON or YAML.

So, you're saying that if I don't have a CoffeeScript parser handy in my language, I should use YAML. (JSON doesn't have comments) I'll accept that, stop complaining, and leave you to your much smaller corner of the Internet than you could have had.

I still believe this is worth doing. Sorry if my previous comment was misleading in that regard.

Atom stores configuration data on disk in CSON, so that's what got me interested in whether this was a "real" format (that could conceivably be read by an arbitrary program) or not. That said, Emacs configuration is stored in the form of ELisp programs, and it's still my favorite editor. :)

By the way, it looks like CSON is a superset of JSON.

I'd add an "(*) well-formatted JSON". But yes, definitely worth mentioning in a potential spec.

I just found CSON, and it looks interesting. I am one of the authors of a JSON library, but our architecture allows us to parse and serialize other formats as long as the internal data model is sufficiently similar. CSON seems to fit that bill.

As we are a C++ library, any kind of reference to Coffee-Script (or a reference implementation written in CS) is mostly useless to us. I can only repeat and stress the importance of having a real specification in an implementation-language-agnostic way.

We are also not just the authors of a JSON library, but for parsing we are also using our own PEG parser library, the PEGTL. I have some experience with writing extended JSON grammars, e.g. we are just about to define a standard for "relaxed JSON", calling it JAXN.

I'd like to see if CSON is another candidate for our library and if there is interest from your side to come up with a more formal specification for it. If you could write a (complete) list of features that CSON should have (being a sub-set of Coffee-Script), I could try to come up with a PEG or a CFG for it (similar to the actual JSON grammar from RFC 7159).

I'd like to co-operate on this, but I would also like to avoid wasting each other's time in case we can not agree on some common goals. Please let me hear your thoughts about this and whether you can see CSON becoming a Coffee-Script independent, self-contained standard (which can still be a sub-set of Coffee-Script, that is not a problem).

I know that @dbushong spent some time wring a PEG (?) for CSON while trying to migrate away from our dependency on coffee-script. I'm not 100% sure where exactly he ran into problems. I think it was something about the fairly liberal whitespace handling in coffee-script..?

Yeah, lemme see if I can find my work thus far and stick it somewhere.

https://github.com/groupon/cson-parser/blob/dpb-native-parser/src/cson.pegjs

There's what I've got thus far. The issues I ran into were, unsurprisingly, around corner cases in object tree parsing. In certain cases (I'll try to dig up a repro) exdented objects are incorrectly parsed as part of the preceding object.

A grammar will usually be only a starting point, additional rules will apply. This is even the case for the JSON grammar itself. I'll check out the grammar you wrote/linked and report back when I had some more time for it... thanks so far.

I just want a quick way to know how to include '' in a string value. There does not seem to be a simple here-are-all-the-rules document for this, or I am missing it. Seems to me maybe that's more appropriate an issue to bring up at bevry/cson#38 - but that issue, of course, led me here.

I believe CSON accepts "..." or '...', so you should be able to say foo: "this 'and' that"

You also should be able to \ things, so even foo: 'this \'and\' that'

ehhc commented

Hey guyes, i want to use CSON in a flutter/dart project (due to interoperability with legacy code). Unfortunately, there is no dart parser for CSON. Furthermore, without any written specification, it's hard to write a parser on myself.. Any ideas what to do? Do you, by chance, know about any dart CSON parser?

This thread is about defining a spec that could be parsed with a PEG grammar. I started and abandoned defining this a while back, but currently the spec is "what this version of coffeescript + this library can parse" - sorry