/cs242

Primary LanguageJava

CS242 Project

Part A

Run by JAR

To run the whole thing by the cs242.jar:

cd bin/
java -jar cs242.jar <subroutine> [options] <arguments...>

Run by class

To use the Launcher to select which subroutine to run:

javac -cp "./:./lib/commons-cli-1.4.jar:./lib/jsoup-1.11.2.jar:./lib/sqlite-jdbc-3.21.0.jar:./lib/lucene-analyzers-common-7.2.1.jar:./lib/lucene-core-7.2.1.jar" src/edu/ucr/cs242/*.java src/edu/ucr/cs242/crawler/*.java src/edu/ucr/cs242/indexing/*.java
java -cp "./src:./lib/commons-cli-1.4.jar:./lib/jsoup-1.11.2.jar:./lib/sqlite-jdbc-3.21.0.jar:./lib/lucene-analyzers-common-7.2.1.jar:./lib/lucene-core-7.2.1.jar" edu.ucr.cs242.Launcher <subroutine> [options] <arguments...>

A list for subroutines:

  • crawler: Execute the Wikipedia crawler
  • indexer: Execute the Lucene indexer

To compile and run the WikiCrawler:

javac -cp "./:./lib/commons-cli-1.4.jar:./lib/jsoup-1.11.2.jar:./lib/sqlite-jdbc-3.21.0.jar" src/edu/ucr/cs242/*.java src/edu/ucr/cs242/crawler/*.java
java -cp "./src:./lib/commons-cli-1.4.jar:./lib/jsoup-1.11.2.jar:./lib/sqlite-jdbc-3.21.0.jar" edu.ucr.cs242.crawler.WikiCrawler <jdbc-url>

The jdbc-url is required to run the Crawler. An example of jdbc-url:

jdbc:sqlite:pages.db

which will create a SQlite database named pages.db in the same directory running the command above.


To compile and run the Indexer:

javac -cp "./:./lib/commons-cli-1.4.jar:./lib/lucene-analyzers-common-7.2.1.jar:./lib/lucene-core-7.2.1.jar:./lib/sqlite-jdbc-3.21.0.jar" src/edu/ucr/cs242/*.java src/edu/ucr/cs242/indexing/*.java
java -cp "./src:./lib/commons-cli-1.4.jar:./lib/lucene-analyzers-common-7.2.1.jar:./lib/lucene-core-7.2.1.jar:./lib/sqlite-jdbc-3.21.0.jar" edu.ucr.cs242.indexing.Indexer <jdbc-url> <index-output-path>

The jdbc-url is the same as the one in WikiCrawler, while index-output-path is the Lucene's index output path, which should be created before running the Indexer.

An example for index-output-path:

./index/

which will save all the indexes generated by Lucene in folder index under current directory.


To perform a cleanup:

find src/ -type f -name "*.class" -delete