To run the whole thing by the cs242.jar
:
cd bin/
java -jar cs242.jar <subroutine> [options] <arguments...>
To use the Launcher
to select which subroutine to run:
javac -cp "./:./lib/commons-cli-1.4.jar:./lib/jsoup-1.11.2.jar:./lib/sqlite-jdbc-3.21.0.jar:./lib/lucene-analyzers-common-7.2.1.jar:./lib/lucene-core-7.2.1.jar" src/edu/ucr/cs242/*.java src/edu/ucr/cs242/crawler/*.java src/edu/ucr/cs242/indexing/*.java
java -cp "./src:./lib/commons-cli-1.4.jar:./lib/jsoup-1.11.2.jar:./lib/sqlite-jdbc-3.21.0.jar:./lib/lucene-analyzers-common-7.2.1.jar:./lib/lucene-core-7.2.1.jar" edu.ucr.cs242.Launcher <subroutine> [options] <arguments...>
A list for subroutine
s:
- crawler: Execute the Wikipedia crawler
- indexer: Execute the Lucene indexer
To compile and run the WikiCrawler
:
javac -cp "./:./lib/commons-cli-1.4.jar:./lib/jsoup-1.11.2.jar:./lib/sqlite-jdbc-3.21.0.jar" src/edu/ucr/cs242/*.java src/edu/ucr/cs242/crawler/*.java
java -cp "./src:./lib/commons-cli-1.4.jar:./lib/jsoup-1.11.2.jar:./lib/sqlite-jdbc-3.21.0.jar" edu.ucr.cs242.crawler.WikiCrawler <jdbc-url>
The jdbc-url
is required to run the Crawler. An example of jdbc-url
:
jdbc:sqlite:pages.db
which will create a SQlite database named pages.db
in the same directory running the command above.
To compile and run the Indexer
:
javac -cp "./:./lib/commons-cli-1.4.jar:./lib/lucene-analyzers-common-7.2.1.jar:./lib/lucene-core-7.2.1.jar:./lib/sqlite-jdbc-3.21.0.jar" src/edu/ucr/cs242/*.java src/edu/ucr/cs242/indexing/*.java
java -cp "./src:./lib/commons-cli-1.4.jar:./lib/lucene-analyzers-common-7.2.1.jar:./lib/lucene-core-7.2.1.jar:./lib/sqlite-jdbc-3.21.0.jar" edu.ucr.cs242.indexing.Indexer <jdbc-url> <index-output-path>
The jdbc-url
is the same as the one in WikiCrawler
, while index-output-path
is the Lucene's index output path,
which should be created before running the Indexer
.
An example for index-output-path
:
./index/
which will save all the indexes generated by Lucene in folder index
under current directory.
To perform a cleanup:
find src/ -type f -name "*.class" -delete