Python package


A Python API for Terrier



  1. Make sure that JAVA_HOME environment variable is set to the location of your Java installation
  2. pip install python-terrier


Pyterrier is not available for Windows because pytrec_eval isn't available for Windows. If you can compile & install pytrec_eval youself, it should work fine.

Colab notebooks

os.environ["JAVA_HOME"] = "/usr/lib/jvm/java-8-openjdk-amd64"    
!pip install python-terrier


Indexing TREC formatted collections

You can create an index from TREC formatted collection using TRECCollectionIndexer.
For TXT, PDF, Microsoft Word files, etc files you can use FilesIndexer.
For Pandas Dataframe you can use DFIndexer.

Retrieval and Evaluation

topics = pt.Utils.parse_trec_topics_file(topicsFile)
qrels = pt.Utils.parse_qrels(qrelsFile)
BM25_br = pt.BatchRetrieve(index, "BM25")
res = BM25_br.transform(topics)
pt.Utils.evaluate(res, qrels, metrics = ['map'])

Experiment - Perform Retrieval and Evaluation with a single function

We provide an experiment object, which allows to compare multiple retrieval approaches on the same queries & relevance assessments:

pt.Experiment(topics, [BM25_br, PL2_br], eval_metrics, qrels)

Learning to Rank

First create a FeaturesBatchRetrieve(index, features) object with the desired features.

Call the transform(topics_set) function with the train, validation and test topic sets to get dataframes with the feature scores and use them to train your chosen model.

Use your trained model to predict the score of the test_topics and evaluate the result with pt.Utils.evaluate().

BM25_with_features_br = pt.BatchRetrieve(index, ["WMODEL:BM25F", "WMODEL:PL2F"], controls={"wmodel" : "BM25"})


Create a LTR_pipeline object with arguments:

  1. Index reference or path to index on disc
  2. Weighting model name
  3. Features list
  4. Qrels
  5. LTR model

Call the fit() method on the created object with the training topics.

Evaluate the results with the Experiment function by using the test topics

pt.LTR_pipeline(index, model, features, qrels, LTR)

  • Alex Tsolov, University of Glasgow
  • Craig Macdonald, University of Glasgow