google/zoekt

Inconsistent results for a single query

dgryski opened this issue · 5 comments

Querying my corpus, I get the following:

$ for i in $(seq 10); do zoekt 'TODO\(dgryski\):' |wc -l; done
      42
      40
      35
      24
      16
      32
      46
      23
      34
       8

I would expect that results would be 100% of matching files.

Querying from the web interface, I get

Found 86 results in 54 files, showing top 50 files 

Using rg as a source of truth,

$ rg 'TODO\(dgryski\):'  |wc -l
      84

The 0-valued zoekt.SearchOptions passed from the command-line search causes the query to be cancelled as soon as one shard returns results.

We can fix by checking if TotalMaxMatchCount > 0 before using it to cancel the query.

diff --git a/shards/shards.go b/shards/shards.go
index 17652f7..c855087 100644
--- a/shards/shards.go
+++ b/shards/shards.go
@@ -189,7 +189,7 @@ func (ss *shardedSearcher) Search(ctx context.Context, q query.Q, opts *zoekt.Se
                        }
                }

-               if cancel != nil && aggregate.Stats.MatchCount > opts.TotalMaxMatchCount {
+               if cancel != nil && opts.TotalMaxMatchCount > 0 && aggregate.Stats.MatchCount > opts.TotalMaxMatchCount {
                        cancel()
                        cancel = nil
                }

thanks for the analysis. Yes, that looks wrong. I'll bake a proper fix (it needs a test.)

separately, there is the question what the CLI interface should be. Currently, it will order the results like the Web UI, but maybe it's better if it returns exhaustive results. What do you think?

Given that the command line has all the other text processing tools to construct pipelines, I think exhaustive results make sense as the default or easily accessible with an option.