where is bm25 introduced?
Opened this issue · 3 comments
Hi,
For the warm-up step, I see a regular dense retrieval model training on the triples.small data provided by MSMarco.
But I don't find any code introducing bm25 index and bm25 sampling.
I guess you are treating triples.small data's negatives as bm25 negs already?
What does bm25 warm up mean? How is that introduced?
Thanks
Hi,
For the warm-up step, I see a regular dense retrieval model training on the triples.small data provided by MSMarco.
But I don't find any code introducing bm25 index and bm25 sampling. I guess you are treating triples.small data's negatives as bm25 negs already?
What does bm25 warm up mean? How is that introduced?
Thanks
Yeah, I also can't find the BM25 index. Have you found the answer to it?
+1
I believe @tangzhy is correct (at least on MSMARCO), the triples.train.small.tsv
were generated by the MSMARCO dataset itself, and they refer to generating the triplets using BM25 in the raw text of the README, hence why there's no reference to BM25 in this repo.