/cs7is3-indi

individual assignment for CS7IS3 Information Retrieval

Primary LanguageJava

cs7is3-ir-indi

the individual assignment for CS7IS3-Information Retrieval.

IMPORTANT

the project is under the home directory of the user.

project root: ~/cs7is3-indi.

the directory structure:

user4ta@ip-172-31-35-243:~/cs7is3-indi$ tree .
.
├── QRelsCorrectedforTRECeval
├── README.md
├── cran
│   ├── cranqrel
│   └── cranqrel.readme
├── data
│   ├── result_EnglishAnalyzer.gnuplot
│   ├── result_EnglishAnalyzer.png
│   ├── result_SimpleAnalyzer.gnuplot
│   ├── result_SimpleAnalyzer.png
│   ├── result_WhitespaceAnalyzer.gnuplot
│   └── result_WhitespaceAnalyzer.png
├── pom.xml
├── src
│   ├── main
│   │   ├── java
│   │   │   └── life
│   │   │       └── tannineo
│   │   │           └── cs7is3
│   │   │               └── indi
│   │   │                   ├── App.java
│   │   │                   └── EnumTag.java
│   │   └── resources
│   │       └── cran
│   │           ├── cran.all.1400
│   │           └── cran.qry
│   └── test
│       └── java
│           └── life
│               └── tannineo
│                   └── cs7is3
│                       └── indi
│                           └── AppTest.java
├── trec_eval
└── trec_eval-9.0.7.tar.gz

Gernerated Data

There are already generated data exists in ./data folder.

Also with the *.gnuplot file.

Compile

run mvn package, a jar file will be generated as ./target/cs7is3-indi-1.0.0-SNAPSHOT.jar.

Run

java -jar target/cs7is3-indi-1.0.0-SNAPSHOT.jar, it will perform 3 x 3 = 9, 9 strategies to index and search the Cranfield data.

TREC_Eval

A compiled trec_eval is in the project root folder.

The file called QRelsCorrectedforTRECeval is also in the project root folder.

Once the program get compiled and runned. Run a command like this to see the evaluation result.

./trec_eval QRelsCorrectedforTRECeval data/result_WhitespaceAnalyzer_LMDirichletSimilarity

Requirement

The AWS instance is using openjdk-8:

user4ta@ip-172-31-35-243:~/cs7is3-indi$ java -version
openjdk version "1.8.0_242"
OpenJDK Runtime Environment (build 1.8.0_242-8u242-b08-0ubuntu3~18.04-b08)
OpenJDK 64-Bit Server VM (build 25.242-b08, mixed mode)

gcc info:

gcc version 7.4.0 (Ubuntu 7.4.0-1ubuntu1~18.04.1)