The OAI-PMH Shell Harvester is able to harvest OAI-PMH targets. It supports multiple configurable targets which can be updated individually. Furthermore, it is able to execute a preset command for each record it updates or deletes. Installing shell-harvester: - dependencies: xsltproc, bc, curl, coreutils (namely for md5sum and shuf) - git clone git://github.com/wimmuskee/shell-oaiharvester.git - use Makefile or: - cp oaiharvester /usr/bin/oaiharvester - mkdir /usr/share/shell-oaiharvester - cp libs/* /usr/share/shell-oaiharvester/. - cp config.xml /etc/shell-oaiharvester/config.xml Using shell-harvester: 1. Configure the harvester Edit the paths in the config.xml file, records are stored in <recordpath>/<repository_id> . 2. Configure a repository Edit a repository in the config file. By default, the example file in /etc/shell-oaiprovider is used. You use your own by calling: $ oaiharvester -c <config file> Some notes on the options: - deletecmd: executed before a record is deleted, optional - updatecmd: executed after a record is updated, optional - ${identifier} holds the identifier - ${filename) is the identifier with optional xz extension - ${path} holds the absolute path of the record - repository update and delete cmd's are executed before the generic ones if available - set, from, until, conditional and recordpath are optional 3. Run the harvester $ oaiharvester -r <repository_id> 4. Only retrieve identifiers $ oaiharvester -r <repository_id> -n 5. Not harvest, but validate the repository $ oaiharvester -r <repository_id> -t *. Unless customized, the log file of the harvest process is stored at /tmp/shell-harvester/log.csv. The logger uses a simple csv format with a line for each downloaded page. YYYY-MM-DD HH:MM:SS,repository,record count,download time,process time The logtype can be configured in the harvester config file, either "combined" putting all instance logs in one file (default), or "instance", and have a log file for each started instance