The Search engine is a thorough implementation to crawl websites. links are taken from a queue, and after checking for politeness and duplicates, their HTML docs are fetched and parsed.Finally docs are saved into databases.. This search engine is considered to be used in a way that is suitable for our usecase, but you can change it. You can setup the necessary tools according to our wiki page.
First of all, DON’T PANIC. It will take 5 minutes to get the gist of what DataPirates SearchEngine is all about.
Before using the searchEngine you have setup the following tools:
- kafka
- hadoop
- hbase
- zookeeper
- elasticsearch
- redis
A complete explanation about what version to use and how to install them is available on wiki page.
- Download and unzip the project.
- Download and install maven 3+
- Create .jar file with running
mvn clean package -DskipTests
in the source directory. This will create a fat-jar in target directory of each module. - Run jar file with
java -jar *.jar
command
- Maven 3.6.0 - Dependency Management
- Alireza Asadi
- Hamidreza Sharifzadeh
- Mohammad Kazem Faghih Khorasani
- Mostafa Ojaghi
See also the list of contributors who participated in this project.