pisa-engine/pisa

`Query::term_weights` is not assigned

elshize opened this issue ยท 11 comments

Query::term_weights seems to be never assigned. The --weighted flag in queries and evaluate_queries programs ignore those, and instead resolve weights when creating cursors.

I noticed this recently when working on the Python integration. Fixing this would simplify the integration since we wouldn't need to resort to term repetition to apply the weighting.

@seanmacavaney thanks for letting us know. Just to be clear, by fixing you mean that term_weights should be used and should affect the query processing, right?

I have some other stuff open now that I want to merge first, but I can look closer into this after that.

Yes. I expected that term_weights would be honoured during query processing, but they appear to only be used in intersection.

The issue isn't urgent though. Query term repetition is good enough for the time being.

I expected that term_weights would be honoured during query processing

Yeah, I was also surprised :)

I always figured we had term_weights for future proofing in case we wanted to support query inputs like: hello:20 world:10 or something like.

Would it be more or less painful to have a vector of pairs/structs? It seems a bit tedious/error prone to have separate vectors accessed by index, at least in my opinion.

Thinking something like:

struct query_term {
  uint32_t term_id;
  double weight;
   ...
};

Would it be more or less painful to have a vector of pairs/structs?

Possibly. This sounds like a good idea, but would have to look at the code again to see if there's anything preventing (or discouraging) that.

Amazing -- thanks for the fix @elshize!

@seanmacavaney please note that this is quite a rewrite around query parsing/handling. Not sure how much that would affect your Python binding once you upgrade.

We would love to provide some better stability in our APIs, but I'm currently actively trying to improve multiple parts of the library, so it will get worse before it gets better unfortunately.

If at any point you have any questions or issues, I'd be more than happy to help with any future upgrades.

Thanks for letting me know. So it's best to hold off on any changes to the Python integration until the API stabilises a bit.

Are there some specific gh issues that you recommend I subscribe to to help keep an eye on this progress?

Are there some specific gh issues that you recommend I subscribe to to help keep an eye on this progress?

Not really, but it may be a good idea to open a tracking issue. Let me think briefly on how to best organize it, and I'll let you know.

@seanmacavaney I created an issue, not much there now, but you can subscribe to get updates: #569