textX/Arpeggio

Support for correct PEG syntax

jontxu opened this issue · 5 comments

Hello,

I have been using Arpeggio as a tool to test a grammar (re)written in PEG recently. It's a really good tool (a bit hard to understand the debugging output sometimes), but I found out that it doesn't really use the syntax defined on the original PEG paper by Bryan Ford.

The main differences are the use of # instead of // for comments and the lack of semicolon for rule endings. I've seen that there is also a clean PEG alternative which doesn't follow the actual syntax (uses = instead of <-, as far as I know).

I understand that using the semicolon is much simpler that not using it when parsing PEG itself, but is there any reason behind the decision? Are there any plans to support the correct PEG syntax? It doesn't deviate much from the norm, so I think doing it might be a good idea.

Thanks in advance.

Hi @jontxu. I don't remember why I choose to change the syntax slightly. I guess it was due to making it more familiar to people used to C-like languages. Anyway, it is very easy to provide different syntaxes in Arpeggio. I wouldn't introduce backward incompatible changes in the existing grammars but I'm open to adding new syntaxes. It would be very easy to provide e.g. originalpeg with the syntax that follow exactly published paper. If you want you can look at how cleanpeg is implemented and contribute the new syntax.

Could it be that v1.9.0 has switched to // for grammar comments? Because after upgrading my project from 1.6.0 to 1.9.0, I got some errors about the # comments and managed to get rid of them by converting to // comments.

EDIT: I misread the issue starter post, so using // for comments actually is the way apreggio handles comments now.
But I know that it used to be different, because in 1.6.0 or 1.6.1 # comments were still working.

@schmittlauch You are right. Thanks. Comments in cleanpeg indeed changed in d80127d (released in 1.8.0) due to introduction of unordered group operator which uses # syntax. It was noted in the CHANGELOG and the relevant issue is #43

So, both variant of PEG uses // for comment now.

@jontxu So the # was dropped for comments due to colliding with the new unordered group operator. With the introduction of this operator Arpeggio is basically a superset of the original PEG.

@jontxu To support original PEG in your grammar you could use cleanpeg parser and do <- -> = replace on the input grammar. In original PEG there is no unordered group so you should be safe to replace '#' -> // (as long as you don't use # in string/regex matches etc.).

grammar = orig_peg_grammar.replace('<-', '=').replace('#', '//')
parser = ParserPEG(grammar)

Hello @igordejanovic, sorry for the delay on the reply.

I think that this solution works by the time being but I think a commonpeg parser would be okay too, having an actual way to parse it without workarounds.