BurntSushi/aho-corasick

In what circumstances is the SIMD code used?

itamarst opened this issue · 3 comments

Hi,

Thanks for the library! Already see some speedups in my prototype benchmarks.

Was curious in what circumstances SIMD will be used? Basically I am trying to optimize purely for speed of running the regex, compile time and memory usage don't matter, so anything I can do to choose fast paths would help.

To expand: pretty sure the code was compiled with SIMD (e.g. if I do target=native I see AVX2 in the assembly). The question is which code paths will hit the fast path.

Ah, found DESIGN. I guess that basically answers the question (and suggests checking for 5000 keywords won't use SIMD).

Yes, 5,000 keywords will definitely blow the SIMD budget currently. The place in the code that determines whether to use the Teddy algorithm (the only explicitly SIMD algorithm in this crate, although memchr is used in some circumstances too) is here:

fn build_imp(&self, patterns: &Patterns) -> Option<Teddy> {
use packed::teddy::runtime;
// Most of the logic here is just about selecting the optimal settings,
// or perhaps even rejecting construction altogether. The choices
// we have are: fat (avx only) or not, ssse3 or avx2, and how many
// patterns we allow ourselves to search. Additionally, for testing
// and benchmarking, we permit callers to try to "force" a setting,
// and if the setting isn't allowed (e.g., forcing AVX when AVX isn't
// available), then we bail and return nothing.
if patterns.len() > 64 {
return None;
}
let has_ssse3 = is_x86_feature_detected!("ssse3");
let has_avx = is_x86_feature_detected!("avx2");
let avx = if self.avx == Some(true) {
if !has_avx {
return None;
}
true
} else if self.avx == Some(false) {
if !has_ssse3 {
return None;
}
false
} else if !has_ssse3 && !has_avx {
return None;
} else {
has_avx
};
let fat = match self.fat {
None => avx && patterns.len() > 32,
Some(false) => false,
Some(true) if !avx => return None,
Some(true) => true,
};

So for Teddy, the maximum number of patterns is 64. The problem is that if you have too many, it overwhelms the algorithm and tends to generate too many false positive candidates. Then checking the candidates ends up dominating search time.

There are other SIMD algorithms that might scale better. Teddy was taken from Hyperscan, which also has an FDR algorithm that should work for larger numbers of literals. But I haven't had a chance to port that yet.

Otherwise, see also the AhoCorasickBuilder::auto_configure method, which also tweaks some performance options. You'll definitely want to enable the use of a DFA for 5,000 keywords, assuming that compilation time doesn't matter too much.

Hope that helps. Happy to answer other questions!