Zulu-Inuoe/jzon

Position-tracking parser

Zulu-Inuoe opened this issue · 1 comments

There should be a feature available for jzon:parse (or perhaps a separate entry point?) to allow tracking source location, whitespace, comments, and original literal text[1] while reading JSON data.

The use-case for this is using jzon to edit JSON files, such as a configuration file, while preserving all source formatting & comments.

[1]: For example, if the original JSON was 0.201e5, we don't want to output 20100.0, and if the input was "\u0020", we don't want to output " "

Necessarily, we'd need to either track this information in a custom data structure, or perhaps track it separately from the returned data.

While tracking it separately has the advantage of keeping the simple 'returns a hash-table/vector/etc' interface, it complicates any code wanting to manually edit this formatting.

Hey, that sounds exactly like what I'm trying to do for common lisp, my solution so far is to use Eclector and produce a CLOS-based syntax tree.

Each node has 4 slots:

  • content: contains the value that would be normally returned by a normal reader, or another node, in the case of a list for example
  • prefix: a string or another node that contains everything between the previous node and this one
    • e.g. when parsing a #| comment |# b, a's prefix would be empty and b's prefix would be #| comment |#
    • I do this in the hope that I can move nodes around and keep the comment in the right place most of the time
    • I might add a suffix to handle case like this: (a #|...|# ) b, right now b's prefix would be ) and #|...|# would be it's own node, but I'm not sure what happens to the after the #|...|#.
  • source: a cons containing the position in the source stream (start . end), it includes the prefix
  • raw: contains the raw string from the source, it might be empty for performance reason. I use displaced strings which is, from a quick benchmark, much much faster because there's less consing.

Keep in mind that this is very much a prototype, and it might change completely. But, I figured that this info might be useful for brainstorming, and I think brainstorming on how to handle this problem for jzon will help me with breeze too.

One design decision I made recently is to parse the input from strings instead of a stream, because I'm going to keep all the information anyway. I was doing some back and forth to get the raw strings after Eclector gave me the parsed results, but doing that on string stream is both very very inefficient and non-portable.