Aditya Kumar Shrivastava, TCD ID: 19323354.
- This project implements a search engine using Apache Lucene.
- We index and search documents from the cranfiled dataset using Lucene.
- This search engine is evaluated using trec_eval.
- Linux VM Public IP Address: 20.105.177.112
- Connect using a client such as putty and use the .ppk file for authorization,
- Login as user: azureuser
- Once logged in, switch user to root using the command: ‘sudo su –‘
- Type in the command: ‘whoami’ and you should see the output as root.
- To run the indexer and searcher, switch to the directory containing the assignment code. The name of the directory would be “ahsrivas_CS7IS3_Assignment1”. You can type ls to see the directories and then use cd to switch to the above-mentioned directory.
- Follow the instructions on the readme file
Step 1: Perform a maven clean
mvn clean
Step 2: Download and Install dependencies
mvn install
Step 3: RunIndexerSearcher the program to index and execute searches on the the cranfield dataset.
mvn exec:java -Dexec.mainClass="RunIndexerSearcher"
Following the generating of the “results_ashrivas.txt” file which contains the results of searching on the cranfield dataset, please proceed to evaluate the results using trec_eval. You may do so by running the following command
./trec_eval/trec_eval QRelsCorrectedforTRECeval results_ashrivas.txt
Please note:
-
When you run the java program 'RunIndexerSearcher.java', the Indexer and Searcher are run with the BM25 similarity score and English Aanlyzer.
- This is so because, when the evaluation was carried out using various similarity scoring and analyzers, BM25 and the English Analyzer had the best scores.
- To run the retrieval system for different analyzers or similairty socring, please feel free to change the variable name of the analyzer.
- For example: presently analyzer = englishAnalyzer. You can just change this line to analyzer = buildCustomAnalyzer;
- buildCustomAnalyzer has been created in the Index.java and Searcher.java files and you should be able to update the analyzer seamlessly.
-
If you are cloning the repository for the first time on the VM, ensure to run the 'make' command in the trec_ecal direcoty/
- If you are not cloning this repository for the first time then trec_eval has been prebuilt on the Virtual Machine and we do not need to run the ‘make’ command
-
QRelsCorrectedforTrecEval should be used when you are attempting the evaluate the search results generated by the program.
-
The results folder contains all the results for various combination of analyzers and similarity scoring techniques. Please feel free to run the results in these files with trec eval.
- If you want to run the result files in the results.txt files with trec_Eval, you can find all the commands in the file: 'trec_eval_evaluation_commands.txt'