/SearchEngineForWikipedia

Given a query, search the Wikipedia Corpus (46 GB) and give the titles of top ten retrieved documents, in ranked order. Queries can be either phrase queries or field based queries. Multi-level indexes were built to improve retrieval speed. Evaluation will be done primarily on the basis of the quality of results and time taken for retrieval (less than 1 sec). Keeping the size of the index was also a challenge. Compression techniques was used for that purpose.

Primary LanguagePython

Watchers