BurntSushi/ripgrep

Adjacent replaced multiline matches result in wrong line numbers

meedstrom opened this issue · 4 comments

Please tick this box to confirm you have reviewed the above.

  • I have a different issue.

What version of ripgrep are you using?

ripgrep 13.0.0
-SIMD -AVX (compiled)
+SIMD +AVX (runtime)

How did you install ripgrep?

APT

What operating system are you using ripgrep on?

Kubuntu 23.10

Describe your bug.

This is similar to #2420, and I understand why that's WONTFIX. This is different though.

Using a multiline regexp, when the regexp matches strings that come immediately one after another, it bungles the line numbers of all of them. Easier to show you with a reproduction example:

What are the steps to reproduce the behavior?

Save a file test.txt containing:

:properties:
:id: fnord
:end:
:properties:
:id: boccob
:end:
:properties:
:id: d321fdddffff
:end:
:properties:
:id: clowns
:end:

Then run

rg -nU '^:properties:\n:id: (.*)\n:end:' -r '$1' test.txt

What is the actual behavior?

The result is

1:fnord
2:boccob
3:d321fdddffff
4:clowns

Only the first hit is correct.

You can see that the line numbers will be correctly reported if you modify the file to add a newline after each instance of ":end:".

What is the expected behavior?

Expected the result:

1:fnord
4:boccob
7:d321fdddffff
10:clowns

Using a multiline regexp, when the regexp matches strings that come immediately one after another, it bungles the line numbers of all of them.

This is an incomplete description of the problem you're reporting. It isn't just when a regex matches strings that are adjacent, it's also required that you use the -r/--replace flag to replace matches with something else. This can be easily demonstrated by omitting the -r/--replace flag and observing that the line numbers are correct:

$ rg -nU '^:properties:\n:id: (.*)\n:end:' test.txt
1::properties:
2::id: fnord
3::end:
4::properties:
5::id: boccob
6::end:
7::properties:
8::id: d321fdddffff
9::end:
10::properties:
11::id: clowns
12::end:

Indeed though, the adjacency part of this seems important. If instead I use the following haystack as test2.txt:

:properties:
:id: fnord
:end:
ZZZ
:properties:
:id: boccob
:end:
ZZZ
:properties:
:id: d321fdddffff
:end:
ZZZ
:properties:
:id: clowns
:end:

Then the line numbers change:

$ rg -nU '^:properties:\n:id: (.*)\n:end:' -r '$1' test2.txt
1:fnord
5:boccob
9:d321fdddffff
13:clowns

As you say, I forgot to mertion the --replace. Changed the title.

I'm not proficient with Rust, but I could try to look for the problem. Any pointers on where to look?

Likely in grep-printer. Look at the "standard" printer.

That's where I would start anyway.