textX/Arpeggio

Reserved Keywords ?

eliottparis opened this issue · 2 comments

Hi, first of all, thanks for this library !

Sorry if I'm missing something obvious, but I'm wondering how I can simply handle reserved keywords in my grammar with Arpeggio.

Suppose that I have some reserved keywords in my grammar, like class or function, and I don't want them to be recognized as valid identifier:

from arpeggio import Kwd, EOF, ParserPython
from arpeggio import RegExMatch as _

##### GRAMMAR ################################################################

def identifier ():         return _(r'[a-zA-Z]\w*') # generate ambiguities with reserved keywords
# ...
def class_body ():         return '{', '}'
def class_name ():         return identifier
def class_declaration ():  return Kwd ('class'), class_name, class_body, EOF

##### MAIN ###################################################################

input_program = 'class class { }'
parser = ParserPython(class_declaration, ignore_case=False, debug=True)
parser.parse(input_program)

The code above will parse the text 'class class { }' without errors, because the second word class match the rule class_name :

?? Try match rule class_name=RegExMatch([a-zA-Z]\w*) in class_declaration at position 6 => class *class { }
   ++ Match 'class' at 6 => 'class *class* { }'

For now, I'm using the following workaround that excludes keywords from the identifier regex:

reserved_keywords = ['class', 'function'] # ...
def identifier (): return _(r'(?!\b({})\b)([a-zA-Z]\w*)'.format ('|'.join (reserved_keywords)))

It works as I expected:

arpeggio.NoMatch: Expected class_name at position (1, 7) => 'class *class { }'.

But is there something more automatic in Arpeggio to achieve that same purpose ? I'm thinking of something like the Keyword class in PyPEG that internally maintains a list of keywords used in the grammar.

Thanks !

Hi,
A PEG way is to use a negative lookahead (Not predicate) to check that there is no keyword ahead before trying to match identifier.

def identifier (): return Not(reserved_keywords), _(r'...')

Another approach is to implement your own parsing expression classes.
In this case you could inherit RegExMatch and override _parse method to verify that there is no keyword ahead before calling super.

First approach is probably what you want. Second is an interesting exercise in making custom parsing expressions. :)

Hi,

Thank you for your quick reply !

I was actually looking for a solution to handle keywords of the grammar more dynamically. The following code will do the job for me:

from arpeggio import StrMatch, Not
from arpeggio import RegExMatch as _

class Kwd (StrMatch):
   _keyword_table = []
   def __init__ (self, to_match):
      super (Kwd, self).__init__ (to_match)
      self.to_match = to_match
      self.root = True
      self.rule_name = 'keyword'
      if to_match not in Kwd._keyword_table:
         Kwd._keyword_table.append (to_match)

def reserved_keywords ():  return _(r'(\b({})\b)'.format ('|'.join (Kwd._keyword_table)))
def identifier ():         return Not (reserved_keywords), _(r'([a-zA-Z]\w*)')
# ...