This set of classes is a lucene implementation of the SPUD retrieval model that appears in "A Polya Urn Document Language Model for Improved Information Retrieval" by Ronan Cummins, Jiaul Hoque Paik, and Yuanhua Lv. The classes depend on the following publicly available jar files: lucene-core-5.0.0.jar lucene-queryparser-5.0.0.jar lucene-analyzers-common-5.0.0.jar lucene-queries-5.0.0.jar commons-math3-3.3.jar jsoup-1.7.3.jar To build the classes, create a "classes" directory at the same level as "src". >mkdir classes Then run >make all Included in this download is the cranfield-collection (modified to the TREC format). The three important files for the modified cranfield collection are: cran.all.1400.trec-format (the documents) cran.qry.trec-format (the queries) cran.qrels.trec-format (the qrels) The only two classes with main methods are: indexing.LuceneTRECIndexer scoring.QuerySearch To index the cranfield collection, create an index file containing the full paths of files that you wish to index. There should be only one line in the index file for the cranfield collection. E.g. ././cran.all.1400.trec-format Then from the classes directory run: >java -cp .:../lib/* indexing.LuceneTRECIndexer ../cranfield-collection/lucene_index ../cranfield-collection/index-file 1 0 This will create the index in the "lucene_index" directory You can then run the queries on the collection from the classes directory as follows: >java -cp .:../lib/* searching.QuerySearch ../cranfield-collection/lucene_index ../cranfield-collection/cran.qry.trec-format ../cranfield-collection/cran.qrels.trec-format This should run the basic spud model using the queries and also calculate some effectiveness metrics for the queries. Copyright © 2015 Ronan Cummins This work is free. It comes without any warranty to the extent permitted by applicable law. You can redistribute it and/or modify it under the terms of the Do What The Fuck You Want To Public License, Version 2, as published by Sam Hocevar. See http://www.wtfpl.net/ for more details.