/SHAN

Scalable Search and Web Crawling

Primary LanguageShell

SHAN

## Scalable Search and Web Crawling

The objective of this work was to take the concepts of information retrieval to implement a scalable framework for the general task of indexing unstructured documents and retrieve them from the web. Our case study was to take Wikipedia data as crawlable and indexable target. After crawling and indexing, a GUI, deployed in the cloud, displays the results and allows the user to do personalised queries. Shan (山) is the chinese character for mountain. It can also be composed concatenating the first letter of the components: Solr Hadoop Apache Nutch.