Final Project for COMP 4321 - Group 4
Contributors: Jia Lu, Sam Baltrus, Adam Feuer
- DB files can be found in the "rocksDB files" folder. The DB which contains the indexed 30 pages starting from http://www.cse.ust.hk/can be found in the "pageIDToURL" folder.
- spider_result.txt and the PDF of the RocksDB scheme can be found in the current directory.
- source code of spider program: searchengine/src/main/java/SE/Crawler.java
- source code of test program: searchengine/src/main/java/SEtests/SpiderTest.java
Install Maven on the machine.
Either:
Follow this guide to set up
Or
use command:
sudo yum install maven
Double check all the paths are correct
Run:
mvn -version
to check maven is installed correctly
Move into the folder 'searchengine'
cd searchengine
To compile run:
mvn compile
To the run the Spider Crawler run:
mvn exec:java -Dexec.mainClass=SE.Crawler
To run the Spider Test run:
mvn exec:java -Dexec.mainClass=SEtests.SpiderTest