rust-lang/regex

Inconsistent behavior with zero-width matches on empty strings

rootCircle opened this issue · 0 comments

What version of regex are you using?

v1.10.3

Describe the bug at a high level.

replace_all in the regex crate replaces empty strings before non-matching characters differently than Python's standard library regex engine. (Rust version of regex doesn't consider empty strings before non-matching characters as valid matches.)

What are the steps to reproduce the behavior?

  1. Create a Regex object with the pattern r"a*" (matches zero or more "a"s).
  2. Apply replace_all to the string "abxd" with a hyphen as the replacement string.
  3. Observed output (Rust): "-a-b-d-"
  4. Expected output (Python): "-a-b--d-"

Rust Code

use regex::Regex;

fn main() {
    let re = Regex::new(r"x*").unwrap();
    let hay = "abxd";

    println!("{:?}", re.replace_all(hay, "-"));
}

Equivalent Python Code:

import re

regex = r"x*"
test_str = "abxd"
subst = "-"

result = re.sub(regex, subst, test_str, 0, re.MULTILINE)

if result:
    print (result)

What is the actual behavior?

replace_all only replaces the empty string before "b" in Rust, not the one before "d".

What is the expected behavior?

Both empty strings should be replaced, resulting in "-a-b--d-".

By the way, I am not sure, if this is an intentional difference or a potential bug?