decalage2/ViperMonkey

Alternative parser: test Lark

decalage2 opened this issue · 0 comments

In practice, the current VBA parser implemented with pyparsing is very slow. In the past I made some tests with ANTLR4 (issue #19), but its python runtime is even slower than pyparsing.
Other issues with pyparsing:

  • the grammar of the parser and the VBA emulation layer are mixed together because we're using the same classes for both. While it is convenient for a small grammar, with a complex parser such as ViperMonkey it makes maintenance difficult, and it is impossible to test different parsers without touching the emulation layer.
  • the current grammar tries to parse the whole VBA code in one go, so any exception breaks the whole parsing. I started to develop a line-based parser but it's too much work.
  • debugging the grammar is very difficult.

Lark is another parser for python that looks faster than pyparsing, and could allow us to separate the parser from the VBA emulation engine: https://github.com/lark-parser/lark

And there are other options: https://tomassetti.me/parsing-in-python/