nicolashernandez/PyRATA

pattern token with colon characters in the value field

Closed this issue · 1 comments

The error can be reproduced with the following command line:
python3 pyrata_re.py 'pos="PRO:PER"' "[{'lemma': 'se', 'pos': 'PRO:PER'}]" --pyrata_data --log
The stdout gives

Traceback (most recent call last):
  File "pyrata_re.py", line 137, in main
    result = compiled_nfa.search(s, mode = mode, pos = pos, endpos = endpos)
  File "/media/hernandez-n/ext4/workspace/17/PyRATA/pyrata/pyrata/compiled_pattern.py", line 563, in search
    an_nfa.step(c, self.lexicons)
  File "/media/hernandez-n/ext4/workspace/17/PyRATA/pyrata/pyrata/nfa.py", line 203, in step
    states_add.update(self.__step_special_state(char, None, cs, lexicons))
  File "/media/hernandez-n/ext4/workspace/17/PyRATA/pyrata/pyrata/nfa.py", line 328, in __step_special_state
    states_add.update(self.__step_special_state(char, state, os, lexicons))
  File "/media/hernandez-n/ext4/workspace/17/PyRATA/pyrata/pyrata/nfa.py", line 237, in __step_special_state
    step_evaluation = state.symbolic_step_expression[0].subs(substitution_list)
AttributeError: 'tuple' object has no attribute 'subs'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "pyrata_re.py", line 299, in <module>
    main() # (sys.argv) # FIXME sys ?
  File "pyrata_re.py", line 140, in main
    except pyrata.nfa.CompiledPattern.InvalidRegexPattern as e:
AttributeError: module 'pyrata.nfa' has no attribute 'CompiledPattern'

The problem comes from the use of sympy dependency which interprets the colon ':' character as read in [1].

In syntactic_step_parser.py,
The line
var[indice] = symbols(single_constraint_string.replace(' ','\\ ')) is rewritten into
var[indice] = symbols(single_constraint_string.replace(' ','\\ ').replace(':','\\:'))

To solve the issue, I have just escaped the character. Will be present in the next push.

[1] http://docs.sympy.org/1.0/_modules/sympy/core/symbol.html.