Build using Maven:
mvn clean package
The eval/
directory contains evaluation tools and scripts, including trec_eval
. Before using trec_eval
, unpack and compile it, as follows:
tar xvfz trec_eval.9.0.tar.gz && cd trec_eval.9.0 && make
Anserini is designed to support experiments on various standard TREC collections out of the box:
- ad hoc retrieval: Experiments on Disks 1 & 2
- ad hoc retrieval: Robust04 experiments on Disks 4 & 5
- ad hoc retrieval: Robust05 experiments on the AQUAINT collection
- ad hoc retrieval: CORE17 experiments on the New York Times collection
- ad hoc retrieval: CORE18 experiments on the Washington Post collection
- ad hoc tweet retrieval: TREC Microblog experiments
- web search: Wt10g collection
- web search: Gov2 collection
- web search: ClueWeb09b collection
- web search: ClueWeb12-B13 collection
- web search: ClueWeb12 collection
Anserini was designed with Python integration in mind, for connecting with popular deep learning toolkits such as PyTorch. This is accomplished via pyjnius. To make this work, tell Maven to explicitly build the fat jar, as follows:
mvn clean package shade:shade
The SimpleSearcher
class provides a simple Python/Java bridge, shown below:
import jnius_config
jnius_config.set_classpath("target/anserini-0.0.1-SNAPSHOT-fatjar.jar")
from jnius import autoclass
JString = autoclass('java.lang.String')
JSearcher = autoclass('io.anserini.search.SimpleSearcher')
searcher = JSearcher(JString('lucene-index.robust04.pos+docvectors+rawdocs'))
hits = searcher.search(JString('hubble space telescope'))
# the docid of the 1st hit
hits[0].docid
# the internal Lucene docid of the 1st hit
hits[0].ldocid
# the score of the 1st hit
hits[0].score
# the full document of the 1st hit
hits[0].content