AmenRa/retriv

Compare retriv's permance to rank_bm25 and pyserini

MarshtompCS opened this issue · 4 comments

Hi! I see that retriv's speed is really impressive in seepd.md. Did you also compare their performances?

AmenRa commented

Hi, performance should be roughly the same for pyserini and retriv.
pyserini is built on top of lucene and retriv's BM25 implementation is based on elasticsearch, which is built on top of lucene. The only difference could be the BM25 hyper-parameter setting. retriv uses the same setting of elasticsearch out-of-the-box. pyserini probably uses that of lucene. Text pre-processing could have some minor differences. In the end, you can make them behave the same and they should both performs similarly out-of-the-box.
I dunno about rank_bm25. I never looked at its source code.

Hi, performance should be roughly the same for pyserini and retriv. pyserini is built on top of lucene and retriv's BM25 implementation is based on elasticsearch, which is built on top of lucene. The only difference could be the BM25 hyper-parameter setting. retriv uses the same setting of elasticsearch out-of-the-box. pyserini probably uses that of lucene. Text pre-processing could have some minor differences. In the end, you can make them behave the same and they should both performs similarly out-of-the-box. I dunno about rank_bm25. I never looked at its source code.

I think it is really necessary to compare the performance through datasets. pyserini's authors said there are many weak BM25 implementation, leading to poor performances. https://arxiv.org/pdf/2104.05740.pdf

AmenRa commented

The main problem with BM25 baselines is that most of the people do not optimize its hyper-parameters when performing comparisons. That's one of the main motivation retriv as a feature to allow you doing that very easily.

Regarding performances, as of now, retriv out-of-the-box performs as follows:
MSMARCO Dev MRR@10: 0.185 Recall: 0.873
TREC DL 2019 NDCG@10: 0.479 Recall: 0.753
TREC DL 2020 NDCG@10: 0.496 Recall: 0.811

Pyserini out-of-the-box performs as follows:
MSMARCO Dev MRR@10: 0.184 Recall: 0.853
TREC DL 2019 NDCG@10: 0.506 Recall: 0.750
TREC DL 2020 NDCG@10: 0.480 Recall: 0.786

The differences you see are mainly due to the the different default BM25's hyper-parameters setting of the two libraries and to a slightly different text pre-processing pipeline.

That's great! Thanks for repoting this!