pedrobiqua/Search_Engine

Implement Reverse Indexing in C++

Opened this issue · 0 comments

We need to implement a reverse indexing system in C++ to optimize document retrieval and improve the search efficiency in our search engine. Reverse indexing will allow us to associate each word with a list of documents or pages where it appears, facilitating keyword-based search.

Tasks:

  • Define the Data Structure:
    Use an efficient structure to store the index (e.g., std::unordered_map or std::map), where the key is a word and the value is a list of documents/pages.

  • Document Parsing:
    Implement a function to process documents or web pages, tokenizing them into words and populating the reverse index.
    Remove punctuation and normalize the text to lowercase.

  • Update the Index:
    Implement logic to update the index as new documents are added or removed.

  • Search Query:
    Implement a function that, given a word, returns the corresponding documents/pages using the reverse index.

  • Testing:
    Create unit tests to ensure that the index works correctly and that queries return the expected results.
    Test with different dataset sizes to evaluate performance.

Requirements:

  • Familiarity with STL data structures (maps and lists).
  • Basic knowledge of string manipulation and text processing in C++.

References: