BurntSushi/aho-corasick

A lot of two-letter combinations break ASCII case-insensitivity

untitaker opened this issue · 3 comments

The following script tests all ASCII two-letter combinations by searching for the lowercase needle in the uppercase haystack. I would expect this script to not output anything, instead a couple of combinations fail with 0.7.9.

The real-world test that triggered this investigation was the haystack SECRET_KEY with the needle secret_key.

extern crate aho_corasick; // 0.7.9

fn main() {
    for c in b'a'..b'z' {
        for c2 in b'a'..b'z' {
            let c = c as char;
            let c2 = c2 as char;
            let needle = format!("{}{}", c, c2).to_lowercase();
            let haystack = needle.to_uppercase();
            let ac = aho_corasick::AhoCorasickBuilder::new()
                .ascii_case_insensitive(true)
                .build(&[&needle]);
            let finds: Vec<_> = ac.find_iter(&haystack).collect();
            if finds.is_empty() {
                println!("needle = {}, haystack = {} => broken", needle, haystack);
            }
        }
    }
}

Playground

Sigh. Looks like this is a regression I introduced in my fix for #53 in #54. In particular, if prefilters are disabled, then your test will pass.

I've put up a fix in #55. Thank you very much for the easy reproduction. I've converted it to a regression test.

The fix for this is in aho-corasick 0.7.10 on crates.io.

Thanks!