/IR14Assignment2

Desktop aplication that can crawl the Web for code snippets, beggining with a set of seed URLs and complying mostly with the robots.txt standards. It uses Lucene to index the webpages and support queries over the crawled data. Different indexes can be created, loaded + stored, allowing for specialized searches. This was developed as a mini-project from an Information Retrieval course, WiSe2014-2015@OvGU.

Primary LanguageJavaApache License 2.0Apache-2.0

IR14Assignment2

Repository for the masters assignment in the IR course, WiSe2014-2015@OvGU.

The Main class can be found in ir.control>CodeSearch.

The program itself is self-contained and should be easy to use.

Additional documentation regarding the source code, including class diagrams, can be found in the doc folder.