The first project of Talkademy Android Internship
- record-level inverted index : contains a list of references to documents for each word.
- word-level inverted index : additionally contains the positions of each word within a document.
- Removing of Stop Words : Stop words are most occurring and useless words in document like “I”, “the”, “we”, “is”, “an”.
- Stemming of Root Word : chop some part of each and every word I read so that I could get the “root word”. There are standard tools for performing this like “Porter’s Stemmer”.
- Record Document IDs : If word is already present add reference of document to index else create new entry.