BurntSushi/aho-corasick

Duplicate match when using find_overlapping_iter() with ascii_case_insensitive

jwass opened this issue · 3 comments

jwass commented

Hello, I think there may be a bug in find_overlapping_iter() with ascii_case_insensitive. I narrowed it down to the following example:

fn main() {
    use aho_corasick::AhoCorasickBuilder;

    let patterns = &["abc", "def", "abcdef"];
    let haystack = "abcdef";

    let ac = AhoCorasickBuilder::new()
        .ascii_case_insensitive(true)
        .build(patterns);
    for mat in ac.find_overlapping_iter(haystack) {
        println!("{:?}", mat);
    }
}

I'd expect a match for each of "abc", "def", and "abcdef", but the output has two entries for "def":

Match { pattern: 0, len: 3, end: 3 }
Match { pattern: 2, len: 6, end: 6 }
Match { pattern: 1, len: 3, end: 6 }
Match { pattern: 1, len: 3, end: 6 }

If I change the line to .ascii_case_insensitive(false) then it outputs what I'd expect:

Match { pattern: 0, len: 3, end: 3 }
Match { pattern: 2, len: 6, end: 6 }
Match { pattern: 1, len: 3, end: 6 }

Is this a bug or am I misunderstanding how the overlapping iter should work?

Looks like a bug to me, yes. There have been a lot of problems with the ASCII case insensitive functionality that result in strange outcomes, so it wouldn't surprise me. Not sure when I'll have a chance to look at this though. Thanks for the small reproduction!

This should be fixed on crates.io in aho-corasick 0.7.15.

jwass commented

Wow. Thanks!