Coverage Maintainability Rating Quality Gate Status Reliability Rating Security Rating Vulnerabilities

Doodle search engine

The Search engine is a thorough implementation to crawl websites. links are taken from a queue, and after checking for politeness and duplicates, their HTML docs are fetched and parsed.Finally docs are saved into databases.. This search engine is considered to be used in a way that is suitable for our usecase, but you can change it. You can setup the necessary tools according to our wiki page.

Getting Started

First of all, DON’T PANIC. It will take 5 minutes to get the gist of what DataPirates SearchEngine is all about.

Prerequisites

Before using the searchEngine you have setup the following tools:

  • kafka
  • hadoop
  • hbase
  • zookeeper
  • elasticsearch
  • redis

A complete explanation about what version to use and how to install them is available on wiki page.

Installing

  • Download and unzip the project.
  • Download and install maven 3+
  • Create .jar file with running mvn clean package -DskipTests in the source directory. This will create a fat-jar in target directory of each module.
  • Run jar file with java -jar *.jar command

Built With

Authors

  • Alireza Asadi
  • Hamidreza Sharifzadeh
  • Mohammad Kazem Faghih Khorasani
  • Mostafa Ojaghi

See also the list of contributors who participated in this project.