terrier-org/pyterrier

Tuning BM25F parameters

Closed this issue · 9 comments

Hi @cmacdonald,

I was trying to tune BM25F parameters. Per the documentation, BM25F is implemented, as described by [Zaragoza TREC-2004]. In Zaragoza's paper, there are 'b' and 'w' parameters per field, and one 'k' global parameter. My questions are as follows:

  1. I figured out that 'b' parameter is actually named 'c' in terrier, and 'w' corresponds to 'w.i' where i is the field number (starting from 0). So, is this mapping correct?
b = 1
bm25f = pt.BatchRetrieve(index, wmodel='BM25F', 
                                    controls={'w.0': 1.0, 'w.1': 0.5, 
                                                'c.0': b, 'c.1': b}, 
                                    verbose=True)
  1. For 'k1' parameter, I could not find the corresponding name. So, could you please let me know what it is?

Just copying my supervisor Dr. @JMMackenzie

(1) yes, this looks right
(2) I dont think we have every tuned k1 in BM25F. 6 parameters was always enough!

Thanks @cmacdonald, for your reply!
What are the 6 parameters?

What are the 6 parameters?

normalisation i.e. b (c.f. c) values for each field and the weight.

Hey Craig, thanks for the help!

Just double checking - does this mean your (Terrier) BM25F doesn't include k? Or it's just not exposed?

Pardon @cmacdonald, but what are the normalization parameters which are exposed in pyterrier, other than 'c'?
I tried to set 'b' and 'b.0' to multiple values, but none of them changed anything in the performance.
If I am not mistaken, the exposed parameters are just 'c' and the weight for each field. Please correct me if I am wrong.

If I am not mistaken, the exposed parameters are just 'c' and the weight for each field. Please correct me if I am wrong.

I'm not sure I follow the question. For BM25F, this is correct, right...?

bm25f = pt.BatchRetrieve(index, wmodel='BM25F', 
                                    controls={'w.0': 1.0, 'w.1': 0.5, 
                                                'c.0': b, 'c.1': b}, 
                                    verbose=True)

Any update guys, or can I close the issue?

I think we've got it figured out now, thanks for the help! We'll get back to you if we need to re-open.