Inconsistent results for a single query

Question

Inconsistent results for a single query

dgryski opened this issue 6 years ago · 5 comments

Querying my corpus, I get the following:

$ for i in $(seq 10); do zoekt 'TODO\(dgryski\):' |wc -l; done
      42
      40
      35
      24
      16
      32
      46
      23
      34
       8

I would expect that results would be 100% of matching files.

Answer 1 · 2018-12-07T18:55:13.000Z

Querying from the web interface, I get

Found 86 results in 54 files, showing top 50 files

Using rg as a source of truth,

$ rg 'TODO\(dgryski\):'  |wc -l
      84

Answer 2 · 2018-12-07T19:24:34.000Z

The 0-valued zoekt.SearchOptions passed from the command-line search causes the query to be cancelled as soon as one shard returns results.

We can fix by checking if TotalMaxMatchCount > 0 before using it to cancel the query.

diff --git a/shards/shards.go b/shards/shards.go
index 17652f7..c855087 100644
--- a/shards/shards.go
+++ b/shards/shards.go
@@ -189,7 +189,7 @@ func (ss *shardedSearcher) Search(ctx context.Context, q query.Q, opts *zoekt.Se
                        }
                }

-               if cancel != nil && aggregate.Stats.MatchCount > opts.TotalMaxMatchCount {
+               if cancel != nil && opts.TotalMaxMatchCount > 0 && aggregate.Stats.MatchCount > opts.TotalMaxMatchCount {
                        cancel()
                        cancel = nil
                }

Answer 3 · 2018-12-10T13:24:49.000Z

thanks for the analysis. Yes, that looks wrong. I'll bake a proper fix (it needs a test.)

Answer 4 · 2018-12-10T13:38:20.000Z

separately, there is the question what the CLI interface should be. Currently, it will order the results like the Web UI, but maybe it's better if it returns exhaustive results. What do you think?

Answer 5 · 2018-12-10T14:17:18.000Z

Given that the command line has all the other text processing tools to construct pipelines, I think exhaustive results make sense as the default or easily accessible with an option.