/Information-Retrieval-System-CS465

Kettering University - CS465 Winter 2022 - Assignment 1

Primary LanguagePython

Kettering University - CS465 Winter 2022 - Assignment 1

Project Requirements

  • Design and implement a simple IR system.

  • The system should:

    • Create the inverted index (the dictionary and postings lists) for your collection of documents

    • Parse and execute simple queries

    • Perform simple tokenization and normalization of the text such as removing digits, punctuation marks, etc.

    • Statistics:

      • Report the number of distinct words observed in each document, and the total number of words encountered.

      • Report the number of distinct words observed in the whole collection of documents, and the total number of words encountered.

      • Report the total number of times each word is seen (term frequency) and the document IDs where the word occurs (Output the posting list for a term).

      • Report the top 100th, 500th, and 1000th most-frequent word and their frequencies of occurrence.

      • Create postings and assign a term frequency to every document in postings list.

      • Provide a simple GUI to test the system.

Help

To access repo, email me at migl8239@kXXX.edu