BurntSushi/aho-corasick

Potential performance improvement, maybe

itamarst opened this issue · 3 comments

At https://github.com/BurntSushi/aho-corasick/blob/master/src/classes.rs#L46, the alphabet_len is calculated each time the function is called. This function is called in (what appears to me to be) a hot inner function, next_state(). It seems like this value doesn't change once the BytesClasses are constructed, so there's no need to recalculate each time, it could be recalculated and cached only when the ByteClasses changes.

(I imagine the compiler might be smart enough to figure this out? But then again maybe not.)

It is plausible that this is a problem, it's plausible that the compiler might be smart enough to work-around this and it's even plausible that the compiler is not smart enough and it still might not be a problem. These state machine loops tend to be limited by how quickly one can shovel bytes into CPU registers, since DFAs don't tend to have great locality properties.

I think the thing to do here is to scrutinize the generated codegen and see if the +1 is getting optimized out or not. Regardless of that, the next step I think would be to very rigorously measure the perf difference of storing the alphabet length vs computing it. I've always found it somewhat difficult to reliably observe perf differences with small tweaks like this to the DFA matching loop.

So I tried this, and... benchmark diffs are all over the place, and in enough places that I suspect it's just noise 😢 So I can't really tell if it makes a difference or not.

One potential approach is something like https://github.com/bheisler/iai, which would give consistent results. I've been using the equivalent for some of my projects, and it's really nice to be able to tell if a microoptimization worked or not (presuming that the emulated CPU is actually the same as a real CPU, which of course it is not).

@itamarst Yeah, in those cases, I tend to revert to ad hoc benchmarks using the aho-corasick-debug tool. That is, I run it on much bigger inputs on an otherwise idle system. This doesn't necessarily get rid of noise, but it does tend to help.

iai definitely looks worth a try. I hadn't heard of it before. It will likely be a bit before I get around to that though.