CS7IS3-Lucene-Cranfield

CA Project 1 for CS7IS3 Information Retrieval and Web Search

Getting Started

Setting up Trec_Eval

git clone https://github.com/usnistgov/trec_eval.git
cd trec_eval
make
mv trec_eval /usr/local/bin/

execute trec_eval to see usage instructions

Setting up the project

git clone https://github.com/ruthbrennankk/CS7IS3-Lucene-Cranfield.git
cd CS7IS3-Lucene-Cranfield
./run.sh

(note you may need to change permissions in order to execute the run.sh file, as root execute the command - chmod 777 run.sh )

Key Directories

  1. Inputs
  2. cranfield collection is stored at CS7IS3-Lucene-Cranfield/src/cran/cran.all.1400
    cranfield collection queries are stored at CS7IS3-Lucene-Cranfield/src/cran/cran.qry
    cranfield query relevance reformatted file at CS7IS3-Lucene-Cranfield/src/formatted_qrel.test
  3. System Outputs
  4. Results/output from a run with a custom analyzer and classic similarity is stored at CS7IS3-Lucene-Cranfield/src/cran/custom_classic.results
    Results/output from a run with a custom analyzer and BM25 similarity is stored at CS7IS3-Lucene-Cranfield/src/cran/custom_bm25.results
    Results/output from a run with the english language analyzer and classic similarity is stored at CS7IS3-Lucene-Cranfield/src/cran/english_classic.results
    Results/output from a run with the english language analyzer and bm25 similarity is stored at CS7IS3-Lucene-Cranfield/src/cran/english_bm25.results
  5. Trec_eval Outputs
  6. Outputs from the trec_eval commands executed by run.sh on each of the above runs can be found in the performance folder