[Feature Request] Use WAND Top-K Retrieval

Question

[Feature Request] Use WAND Top-K Retrieval

hockyy opened this issue 2 years ago · 3 comments

@inproceedings{petri2013exploring,
  title={Exploring the magic of WAND},
  author={Petri, Matthias and Culpepper, J Shane and Moffat, Alistair},
  booktitle={Proceedings of the 18th Australasian Document Computing Symposium},
  pages={58--65},
  year={2013}
}

I believe if you're using inverted index and token - docs list, using the WAND Top-K Retrieval Algorithm can speedup retrieval for small K in large documents. I'm not sure whether it's relevant to this project. I've once implemented this https://raw.githubusercontent.com/hockyy/ir-pa-2/main/bsbi.py

Answer 1 · 2022-12-08T13:30:44.000Z

Hi, thanks for the suggestion and code!

I have an implementation of another optimization algorithm for top-k retrieval on my local branch. Unfortunately, it slows down the retrieval because (I suspect) it requires more instructions to be executed (even if they are applied to less data).
Current implementation heavily relies on vector computations, which are fairly optimized on modern CPUs.

I will let you know if WAND improves efficiency over the current implementation.

Answer 2 · 2022-12-22T08:37:57.000Z

Hi, I have a working WAND implementation, but it is slower than brute force vector operations.
I am now considering more advanced WAND-based approaches. I hope to add one soon.

Answer 3 · 2023-07-07T07:37:39.000Z

Unfortunately, I don't think this will happen anytime soon. The lexical retriever is already reasonably efficient, and there are other things I prefer to prioritize.

I will close for now.