Doodle search engine

The Search engine is a thorough implementation to crawl websites. links are taken from a queue, and after checking for politeness and duplicates, their HTML docs are fetched and parsed.Finally docs are saved into databases.. This search engine is considered to be used in a way that is suitable for our usecase, but you can change it. You can setup the necessary tools according to our wiki page.

Getting Started

First of all, DON’T PANIC. It will take 5 minutes to get the gist of what DataPirates SearchEngine is all about.

Prerequisites

Before using the searchEngine you have setup the following tools:

kafka
hadoop
hbase
zookeeper
elasticsearch
redis

A complete explanation about what version to use and how to install them is available on wiki page.

Installing

Download and unzip the project.
Download and install maven 3+
Create .jar file with running mvn clean package -DskipTests in the source directory. This will create a fat-jar in target directory of each module.
Run jar file with java -jar *.jar command

Built With

Maven 3.6.0 - Dependency Management

Authors

Alireza Asadi
Hamidreza Sharifzadeh
Mohammad Kazem Faghih Khorasani
Mostafa Ojaghi

See also the list of contributors who participated in this project.