This is a project for search-engine
- Built a text-based large scale search engine indexed using a pre-index corpus consisting of 10% of all Wikipedia webpages (Lucene API) on corpus of 500,000+ documents from ClueWeb09 dataset.
- Created parsers which would be able to handle structured queries consisting of operators like 'AND', 'OR', 'NEAR', 'WEIGHT', 'WINDOW' as well as handle Bag of Words(BoW) queries.
- Implemented retrieval algorithms including Ranked/Unranked Boolean retrieval method, BM25 retrieval method, and Indri retrieval method.
- Implemented query expansion based on pseudo relevance feedback and feature-based search based on SVM, which improve the precision by 20% on average.