/Anserini

An information retrieval toolkit built on Lucene

Primary LanguageJava

Anserini

Build Status

Getting Started

Build using Maven:

mvn clean package

The eval/ directory contains evaluation tools and scripts, including trec_eval. Before using trec_eval, unpack and compile it, as follows:

tar xvfz trec_eval.9.0.tar.gz && cd trec_eval.9.0 && make

Running Standard IR Experiments

Anserini is designed to support experiments on various standard TREC collections out of the box:

Tools

Python Interface

Anserini was designed with Python integration in mind, for connecting with popular deep learning toolkits such as PyTorch. This is accomplished via pyjnius. To make this work, tell Maven to explicitly build the fat jar, as follows:

mvn clean package shade:shade

The SimpleSearcher class provides a simple Python/Java bridge, shown below:

import jnius_config
jnius_config.set_classpath("target/anserini-0.0.1-SNAPSHOT-fatjar.jar")

from jnius import autoclass
JString = autoclass('java.lang.String')
JSearcher = autoclass('io.anserini.search.SimpleSearcher')

searcher = JSearcher(JString('lucene-index.robust04.pos+docvectors+rawdocs'))
hits = searcher.search(JString('hubble space telescope'))

# the docid of the 1st hit
hits[0].docid

# the internal Lucene docid of the 1st hit
hits[0].ldocid

# the score of the 1st hit
hits[0].score

# the full document of the 1st hit
hits[0].content