Duplicate match when using find_overlapping_iter() with ascii_case_insensitive
jwass opened this issue · 3 comments
Hello, I think there may be a bug in find_overlapping_iter()
with ascii_case_insensitive
. I narrowed it down to the following example:
fn main() {
use aho_corasick::AhoCorasickBuilder;
let patterns = &["abc", "def", "abcdef"];
let haystack = "abcdef";
let ac = AhoCorasickBuilder::new()
.ascii_case_insensitive(true)
.build(patterns);
for mat in ac.find_overlapping_iter(haystack) {
println!("{:?}", mat);
}
}
I'd expect a match for each of "abc", "def", and "abcdef", but the output has two entries for "def":
Match { pattern: 0, len: 3, end: 3 }
Match { pattern: 2, len: 6, end: 6 }
Match { pattern: 1, len: 3, end: 6 }
Match { pattern: 1, len: 3, end: 6 }
If I change the line to .ascii_case_insensitive(false)
then it outputs what I'd expect:
Match { pattern: 0, len: 3, end: 3 }
Match { pattern: 2, len: 6, end: 6 }
Match { pattern: 1, len: 3, end: 6 }
Is this a bug or am I misunderstanding how the overlapping iter should work?
Looks like a bug to me, yes. There have been a lot of problems with the ASCII case insensitive functionality that result in strange outcomes, so it wouldn't surprise me. Not sure when I'll have a chance to look at this though. Thanks for the small reproduction!
This should be fixed on crates.io in aho-corasick 0.7.15
.
Wow. Thanks!