/crawlrss

Crawl RSS - Heritrix 3 add-on

Primary LanguageJavaOtherNOASSERTION

Crawl RSS - Heritrix 3 add-on

Build Status

NOTE: This add-on will only work with Heritrix 3.3.0 or later.

Installation

  1. Download the code

  2. Run "mvn package". This generates a distribution tar.gz file.

  3. Extract the archive from step #2 into the root directory of a Heritrix (3.3.0+) instance

  4. Startup Heritrix as usual

  5. Base your job on the supplied profile "CrawlRSS-Sample-Profile"