els0r/goProbe

Use ANTLR for condition parsing

basman opened this issue · 5 comments

basman commented

Looking at the complexity of the parser and the auto completion after we added the direction filter, using ANTLR as a lexer/parser could reduce code complexity.

Pro:

  • Use same grammar file for parser and auto-completion, the former will build the AST and the latter will recognize the many verification cases - both based on the same grammar file.
  • Using ANTLR would ease extension of the syntax to support a more liberal filter style like tcpdump.

Con:

  • Building requires ANTLR to be installed. I don't understand yet, if the ANTLR go package is sufficient (good) or if the executable antlr tools are required (bad) in order to produce the parser.

Off the top of my head I think introducing ANTLR as a dependency seems a bit of an overkill to me, for the following reasons:

  • One of our main goals (in particular when working on the v4 release) was to reduce weight / dependencies of goProbe (from a quick glance I'm not sure about the exact set of dependencies either, but let's say this: If executables / system libraries / CGO are / is required I'd consider that a red flag / blocker, because we went through a lot of trouble to get rid of any system dependencies and adding a new one would basically be a huge step backwards into v3 land).
  • The parser, while not being the most straightforward implementation (I still can't wrap my head around the full logic TBH, but then again I was never one for lexer logic 😛) has been sufficient to cover all use cases in the past - the scenario of an additional direction filter is / was the first time we had to touch the code in years. I'm not convinced that this is something that will be touched again [often] in the future (and just doing it for the sake of being tcpdump-style doesn't warrant the probably significant effort IMHO).
  • Aside from the effort involved I wonder if that would be a breaking change? "Our" grammar is a lot more flexible than e.g. tcpdump (e.g. supports |, || andor to name just one). Can ANTLR cover all these cases (maybe a stupid question, but having never worked with it I'm just putting it out there)?

That being said, I'm of course very opinionated on these things (mostly because of the point of minimizing dependencies, in particular seemingly complex ones like ANTLR) - If someone wants to take a shot at replacing the current lexer (while not introducing any red flags as hinted above), by all means.

@els0r What is your take? Anything to add?

els0r commented

I can totally follow the reasoning to use a grammar parser for the tcpdump-style syntax. Why reinvent the wheel? The support for it was discussed between basman and me previously. As it stands, most NOC engineers are still more familiar with the tcpdump syntax.

So why not support it?

After all, it’s just a subset of the rich query syntax we offer with goquery.

What I see as unreasonable overhead is to try to mold the current grammar into a format that ANTLR can use it. I’m also a burnt child regarding auto-generated code, so let’s proceed in an exploratory fashion and see if it’s worth it.

I don’t feel that strongly about introducing additional dependencies. But only if they don’t increase complexity in the build and later deployment of the software suite. Also, if they drag us back into C-Land, I’m less inclined.

  • TL;DR: what Kohn said, but all ears for the idea 😎
basman commented

Just to complete my initial idea, after playing around with ANTLR. We could generate the parser once and commit the generated go files to the repo. This way we only need ANTLR on the developer's machine when changing the grammar, avoiding another build dependency on build hosts.

basman commented

Still the effort to replace the parser with ANTLR is significant and its not needed.

@basman Thanks a lot for putting in the effort and researching the potential impact - if it would only be required on a development machine (and even then only if the parser is touched) that would of course render most of my remarks irrelevant, so I'll leave it in your capable hands to decide whether it's worth the effort or not. If a different syntax makes more people / engineers use the tool then yay!