pisa-engine/pisa

Refactor query term weights

JMMackenzie opened this issue · 0 comments

Currently, term weighting is handled within the Cursors classes. In particular, the ScoredCursor class stores the query term weight (the weight assigned to a term at query time, usually set to 1.0 but can be set on a per-term basis by the 'user') and this weight can be pulled out of the cursor with the query_weight() function.

Each of the cursors handles this weighting behind the scenes; instead of scoring a document/term pair by the ranking function, it will instead return the rank function output multiplied by the term weight -- This all happens "silently" from within the cursor, so nothing special needs to be done in the algorithm itself. The same goes for the upper-bound scores, which are multiplied by the term weight before being stored. [See #467 for more information].

The problem is that for the block_max approaches, the unweighted block_max score is actually returned, and the weight calculation is handled directly by the algorithm. See the following example:

ordered_cursors[i]->block_max_score() * ordered_cursors[i]->query_weight();

I think the desired behavior would be to modify the block_max_score_cursor's block_max call to do the multiplication with the term_weight before returning it. That is, modifying the following line:
https://github.com/pisa-engine/pisa/blob/master/include/pisa/cursor/block_max_scored_cursor.hpp#L32

Then, what we'd need to do, is to re-work each of the block_max algorithms to remove their explicit weight multiplication (since it will be done inside the cursor). Everything should then work as expected.

The main point of this issue is to discuss whether having the term and impact weights "coupled" tightly makes sense, and if there is ever a case where we might not want this tight coupling. My expectation is that coupling will make things much simpler, and if a user ever wanted to de-couple then we could implement additional _unweighted versions of each function and expose them through the cursor.

@elshize and @amallia -- What do you think?