Treeton is a JSON parser built on the top of Treetop.
This is a pet project I've developed to learn about grammars and parser generators. In the meantime I have also gotten a deeper understanding of the JSON format.
Despite it has a really cool name this project is not intended to be used in production environments.
If you don't want to believe me and think that Treeton is so cool that it's worth a try, let me throw a couple benchmarks at you.
This is the profiling of Treeton and Yajl when parsing the same JSON:
//TODO
//TODO
You can read the whole JSON RFC but you can get the grammar in a nutshell just by looking at the railroad diagrams.
I picked JSON grammar for this experiment because its fairly simple. It is
basically composed by six types: booleans
, strings
, numbers
, arrays
,
objects
and null
.
Lets look at the grammar of each one of them, and how it has been translated to Treetop rules:
Which is translated nicely into:
rule value
string / number / array / object / true / false / null
end
The boolean values true
, false
and the null
value are terminals expressed
in their own rules:
rule true
'true'
end
rule false
'false'
end
rule null
'null'
end
This has been translated into the following rule:
rule number
integer_part decimal_part? exponent?
end
You can chechout the detailed rules in the numbers grammar.
The rule for strings is pretty straightforward:
rule string
quotation_mark (escaped_character / character)* quotation_mark
end
The definition of each of this sub-rules can be found in the strings grammar.
In this case the rule gets a little bit uglier in order to handle the three possible cases: empty array, array with a single element, array with more than one element.
Notice the reference to the previous rule, value
, which makes the array
able to hold any of the types recognized by JSON grammar.
rule array
open_square_bracket value? (comma value)* close_square_bracket
end
The sub-rules can be found in the arrays grammar.
Object grammar rule is pretty similar to the array's one
rule object
open_curly_brace (string colon value)? (comma string colon value)* close_curly_brace
end
The sub-rules can be found in the objects grammar.
The comma
rule is defined in the
arrays grammar.