Inconsistent results for a single query
dgryski opened this issue · 5 comments
Querying my corpus, I get the following:
$ for i in $(seq 10); do zoekt 'TODO\(dgryski\):' |wc -l; done
42
40
35
24
16
32
46
23
34
8
I would expect that results would be 100% of matching files.
Querying from the web interface, I get
Found 86 results in 54 files, showing top 50 files
Using rg
as a source of truth,
$ rg 'TODO\(dgryski\):' |wc -l
84
The 0-valued zoekt.SearchOptions
passed from the command-line search causes the query to be cancelled as soon as one shard returns results.
We can fix by checking if TotalMaxMatchCount > 0
before using it to cancel the query.
diff --git a/shards/shards.go b/shards/shards.go
index 17652f7..c855087 100644
--- a/shards/shards.go
+++ b/shards/shards.go
@@ -189,7 +189,7 @@ func (ss *shardedSearcher) Search(ctx context.Context, q query.Q, opts *zoekt.Se
}
}
- if cancel != nil && aggregate.Stats.MatchCount > opts.TotalMaxMatchCount {
+ if cancel != nil && opts.TotalMaxMatchCount > 0 && aggregate.Stats.MatchCount > opts.TotalMaxMatchCount {
cancel()
cancel = nil
}
thanks for the analysis. Yes, that looks wrong. I'll bake a proper fix (it needs a test.)
separately, there is the question what the CLI interface should be. Currently, it will order the results like the Web UI, but maybe it's better if it returns exhaustive results. What do you think?
Given that the command line has all the other text processing tools to construct pipelines, I think exhaustive results make sense as the default or easily accessible with an option.