BM25F

the BM25F ranking formula is an extension of the BM25 ranking formula, modified to work on documents with several fields (see Wikipedia BM25F, or this article by Perez-Iglesias et al.).

BM25F for Solr

I wrote bm25f for the first time in 2010 when I was collaborating with Europeana, and then upgraded it several times (with help from Yorgos Mamakis) to the newer versions of Solr, but I never submitted a patch (my bad, I was shy). I upgraded the old code to the Solr 6 interface during the Lucene4IR Hackathon and during the London Lucene Solr Meetup Hackathon.

TODO

  • Together with Henry Cleland we ported the bm25f ranking function for a single term query. The bm25f boolean (multiterm-)query needs to be fixed (and tested). The code that still has to be fixed is commented in the repo;
  • explain() can be improved (and in general all the code, some methods/variables are not used, finals can be added ... );
  • More unit tests can be added, adapting them from the old ones (available in the old repo);
  • Improve documentation, again I had some documentation in the old repo.

If you work want to work on this feel free to reach me at my email address diego [dot] ceccarelli [at] gmail [dot] com