Search Engine

The first project of Talkademy Android Internship

Notes

Inverted Index

  • record-level inverted index : contains a list of references to documents for each word.
  • word-level inverted index : additionally contains the positions of each word within a document.

Steps to build an inverted index:

  1. Removing of Stop Words : Stop words are most occurring and useless words in document like “I”, “the”, “we”, “is”, “an”.
  2. Stemming of Root Word : chop some part of each and every word I read so that I could get the “root word”. There are standard tools for performing this like “Porter’s Stemmer”.
  3. Record Document IDs : If word is already present add reference of document to index else create new entry.