[Feature Request] Use WAND Top-K Retrieval
hockyy opened this issue · 3 comments
@inproceedings{petri2013exploring,
title={Exploring the magic of WAND},
author={Petri, Matthias and Culpepper, J Shane and Moffat, Alistair},
booktitle={Proceedings of the 18th Australasian Document Computing Symposium},
pages={58--65},
year={2013}
}
I believe if you're using inverted index and token - docs list, using the WAND Top-K Retrieval Algorithm can speedup retrieval for small K in large documents. I'm not sure whether it's relevant to this project. I've once implemented this https://raw.githubusercontent.com/hockyy/ir-pa-2/main/bsbi.py
Hi, thanks for the suggestion and code!
I have an implementation of another optimization algorithm for top-k retrieval on my local branch. Unfortunately, it slows down the retrieval because (I suspect) it requires more instructions to be executed (even if they are applied to less data).
Current implementation heavily relies on vector computations, which are fairly optimized on modern CPUs.
I will let you know if WAND improves efficiency over the current implementation.
Hi, I have a working WAND implementation, but it is slower than brute force vector operations.
I am now considering more advanced WAND-based approaches. I hope to add one soon.
Unfortunately, I don't think this will happen anytime soon. The lexical retriever is already reasonably efficient, and there are other things I prefer to prioritize.
I will close for now.