/IR-engine

Information Retrieval Engine, Information Retrieval 2016. University of Aveiro

Primary LanguageJavaMIT LicenseMIT

Information Retrieval Engine

Requirements

Execution

java -jar IR-maven.jar <path corpus folder > <path stoptword file> <Max memory>

--

Task 1

Modelling: classes and main methods definition. a) Keep in mind modularity and flexibility. b) Describe your classes, main methods, and data flow in the report.

Task 2

Implement a simple corpus reader, tokenizer, and Boolean indexer. a) Develop your own tokenizer from scratch. Integrate the Porter stemmer (http://snowball.tartarus.org/) and a stopword filter in your code. b) Index a small corpus (to be defined later) and submit a text file with the resulting index, following the scheme: term,document frequency,list of documents

Task 3

Implement an indexer based on the vector-space model, using the tf-idf weighting scheme and lnc.ltc strategy, as described in the slides. a) Write your index to disk so that the searcher module can efficiently load it. b) Index the corpus (to defined later on).

Task 4

Implement a ranked retrieval method. a) Load the index from disk.

Authors