This readme is heavily based (i.e. copied from) the Anserini readme.
This is the docker image for BM25PRF conforming to the OSIRRC jig for the Open-Source IR Replicability Challenge (OSIRRC) at SIGIR 2019.
This image is available on Docker Hub.
This image implemented Bm25+Pseudo Relevance Feedback(PRF) with Anserini.
- Supported test collections:
robust04
- Supported hooks:
init
,index
,search
,interact
The following jig
command can be used to index TREC disks 4/5 for robust04
:
python run.py prepare \
--repo matthewzz/anserini_bm25prf \
--tag latest \
--collections robust04=/path/to/disk45=trectext
The following jig
command can be used to perform a retrieval run on the collection with the robust04
test collection with default params.
python run.py search \
--repo matthewzz/anserini_bm25prf \
--output out \
--qrels qrels/qrels.robust04.txt \
--topic topics/topics.robust04.txt \
--collection robust04 \
--top_k 100
The following jig
command can be used to tune the hyper-parameters (first 50 topics in robust04
will be used as validation set).
Note that the grid search may take several hours.
python run.py interact \
--repo matthewzz/anserini_bm25prf \
--tag latest \
The following numbers should be able to be re-produced using the scripts provided by the jig.
TREC 2004 Robust Track Topics.
- BM25+PRF: k1=0.9, b=0.4, k1_prf=0.9, b_prf=0.4, num_new_temrs=20, num_docs=10, new_term_weight=0.2
Metric | Score |
---|---|
MAP | 0.2928 |
P@30 | 0.3438 |