Igalia/pflua

Pflua's parser supports a subset of pflang's syntax; automatically use bpf-lua if it fails?

kbara opened this issue · 4 comments

Pflua's parser supports less things than libpcap's does. The bpf-lua pipeline of pflua uses libpcap to do the parsing, so has perfect libpcap parsing compatibility by definition. A wild thought: if parsing fails, should we automatically switch to using the bpf-lua pipeline?

Benefits:
a) There are far less (or no, if we're really optimistic) parsing issues left if we do this
b) The code is still compiled to lua and run by luajit, giving us nice speed increases compared to using libpcap throughout (it just goes through bpf asm in the middle and has less optimizations applied than the pure-lua pipeline does).

Downsides:
a) Are people willing to silently have libpcap be used, and are they ok with needing libpcap installed if they use unusual pflang syntax?
b) The bpf-lua pipeline is a little slower than the pure-lua pipeline, and this could mask minor syntax issues that could be easily added to pflua's parser if they were noticed. It's still much faster than the non-pflua alternatives, though.

This has come up in a couple of issues:
#159 [having the net primitive accept dotted quads, reported by someone who wants to be able to do that]
#161 [our parser doesn't support dotted triples/pairs/bare numbers as ipv4 addresses, as pflang allows]
#82 [our parser doesn't support negative numbers, either simply or in their full generality]
@andywingo @dpino @lukego

This is a good question. The quid here is whether to automatically fallback or not. In my opinion, I'd let the user decide what to do. If pflua fallbacks automatically, bugs would go unnoticed. What's more, the user will be trying first to run the expression with the native pipeline, and then the bpf pipeline. It's better to know an expression is failing and pass bpf = true to pflua. The packet_filter app in snabbswitch can pass down parameters to pflua.

On the other hand, expressions that don't always compile due to syntax errors, etc has been a constant in the development of the project. For me there are syntax issues or features which are more important than others. In the case of the bug reported by @eugeneia, it was important enough to got it fixed cause it affected to a very common operator, net. The problem was not the lack of support of shorter dotted addresses, but not passing a mask number. The current implementation didn't consider that case.

With regard to #161, having shorter IPv4 addresses is nice, but not a priority IMHO. It's always possible to write '192.168' as '192.168.0.0', there's a reasonable way to work around the bug. Still, I'm working on it.

So my conclusion is I wouldn't fallback automatically and I think is a good thing not to do it so we can know what's missing, not working well, etc. Fix the things that are important, take our time to fix those which are less important.

I'm of two minds on this. On one hand, I strongly agree with everything you said. On the other hand, I see the current approach sane for us, and for one level of users away (the Snabb folk), but at two levels away (people building other apps within Snabb) or worse, 3 levels away (end users who have pflang expressions they want to use), I'm less certain about which approach makes most sense. For the last group, in particular, it seems a lot like exposing an implementation detail that shouldn't be something they have to think about, even if it's just a matter of them having to respond to a message saying "$x failed, rerun with -bpf".
I don't think we should make this change yet, if we make it, but I think it's worth thinking about whether it's a change we want to make in the not-so-distant future.

From a Snabb Switch perspective: We are interested in consistent and preferably high worst-case performance. Are there any performance considerations with regard to pflua vs bpf-lua? It could be surprising to end users if performance decreased significantly due to the usage of certain pflang keywords.

@eugeneia Honestly, we need better data on that, preferably from benchmarking extensively on real and artificial network workloads. That said, here are the results from pflua-bench run against pflua's HEAD; take them with a grain of salt. They showed bpf-lua doing better than I expected, though its performance on "portrange 0-6000" in the last picture suggests that it does have unexpected cases where it does significantly worse than pflua, while still being a lot faster than libpcap. Between that and switching to it implying a different set of bugs (we reported one in libpcap yesterday - it was a mis-optimization that made both bpf-lua and libpcap match packets that both our pure-lua implementation and and unoptimized libpcap did not match), I think I'm convinced that automatically falling back to bp-lua is the wrong way to go, though manually falling back will sometimes be a sane thing to do.

1gb-1kb-tcp-port-5555
ping-flood
wingolog org-1
wingolog org-2