Optimization: filter + scoring by timestamp

Question

Optimization: filter + scoring by timestamp

Opened this issue 2 months ago · 2 comments

fulmicoton commented 2 months ago

There are several ideas we could leverage to sort by timestamp.

When we sort AND filter a timestamp range, we end up fetching the timestamp twice.

The timestamp is often almost sorted.
A minor amount of metadata could make it possible to restrict our query.

within [t_start, t_end] implies doc in [doc_a, doc_b]

Answer 1 · 2024-04-24T11:29:14.000Z

I think this is similar to what I wrote here recently quickwit-oss/tantivy#2352 (comment) :

I've been thinking if we should flag fast fields as almost sorted during creation (e.g. almost sorted in a range of 100 values) and then use that information to do a binary_search + 100 values scan.

The almost sorted check could be done during serialization and should not cost much.

Answer 2 · 2024-04-25T00:19:55.000Z

Yes. Let's keep that for later though. :)