google/zoekt

LineMatches sometimes contain newlines

ijt opened this issue · 1 comments

ijt commented

The doc comment for the LineMatch type says

// LineMatch holds the matches within a single line in a file.

but it currently does not act as advertised.

Here is a test that runs a query containing a newline. It expects the resulting FileMatch structure to contain two LineMatches, one per line, but instead it gets back a single LineMatch containing two lines.

func TestQueryNewlines(t *testing.T) {
	b := testIndexBuilder(t, nil,
		Document{Name: "filename", Content: []byte("line1\nline2\nbla")})

	sres := searchForTest(t, b, &query.Substring{Pattern: "ine2\nbla"})

	matches := sres.Files
	want := []FileMatch{{
		FileName: "filename",
		LineMatches: []LineMatch{
			{
				LineFragments: []LineFragmentMatch{{
					Offset:      7,
					LineOffset:  1,
					MatchLength: 4,
				}},
				Line:       []byte("line2"),
				LineStart:  6,
				LineEnd:    11,
				LineNumber: 2,
			},
			{
				LineFragments: []LineFragmentMatch{{
					Offset:      13,
					LineOffset:  0,
					MatchLength: 3,
				}},
				Line:       []byte("bla"),
				LineStart:  13,
				LineEnd:    16,
				LineNumber: 3,
			},
		}}}

	if !reflect.DeepEqual(matches, want) {
		t.Errorf("got %v, want %v", matches, want)
	}
}

Here is the output of the test:

[ ~/src/github.com/ijt/zoekt ] go test ./...
--- FAIL: TestQueryNewlines (0.00s)
    index_test.go:214: got [{0  filename  [] [{[108 105 110 101 50 10 98 108 97] 6 15 2 false 0 [{1 7 8}]}] [] []    }], want [{0  filename  [] [{[108 105 110 101 50] 6 11 2 false 0 [{1 7 4}]} {[98 108 97] 13 16 3 false 0 [{0 13 3}]}] [] []    }]
FAIL
FAIL    github.com/google/zoekt 0.067s

Fixing this would make it much easier for Sourcegraph to support multiline searches.

I'm happy to contribute a fix.

ijt commented

I have a fix for this: https://github.com/google/zoekt/compare/master...ijt:newlines-one?expand=1. I know it's meant to be done through Gerrit. I'll do that next.