/pkppln

Tools used in the PKP PLN

Primary LanguagePythonGNU General Public License v3.0GPL-3.0

PKP PLN Staging Software

Code and example data related to the Public Knowlege Project Private LOCKSS Network.

More info is available on this repo's wiki.

Testing

There is a sample database schema in pkppln.sql - load it into a MySQL database and edit config_test.cfg to point at the test database.

The automatic tests require a running WSGI instance. Luckily bottle.py makes this easy. Start a test server from a terminal window:

python server.py config_test.cfg

Then, in a second terminal window, run the unit tests with automatic discovery:

python -m unittest discover tests

Configuration

This is a minimal configuration for Apache to run the server.py script as a WSGI.

 <VirtualHost 127.0.0.1>
  ServerName pkppln.dvh

  WSGIDaemonProcess pkppln.dvh processes=2 threads=15
  WSGIProcessGroup  pkppln.dvh
  
  WSGIScriptAlias / /path/to/pkppln/server.py
</VirtualHost>

This will run the server.py with configutation data from config.cfg in the same directory. You will need to update the config.cfg file with your actual configuration data.

Microservices

The server accepts SWORD deposits with a link to a BagIt file. The SWORD deposit also contains some metadata. The staging server runs each deposit through number of "microservices" to validate the data in different ways and prepare it for deposit to a LOCKSSOMatic instance. The services are:

harvest
Download the deposit BagIt file.
validate_payload
Check the file size and checksum of the BagIt file against the metadata in the SWORD deposit.
validate_bag
Extract the contents of the BagIt file and validate it.
virus_check
Check the content of the deposit with ClamAV's clamd.
validate_export
Validate the OJS export XML.
reserialize_bag
Add the results of validation and virus checking to the BagIt data, and serialize it into a new BagIt file.
stage_bag
Move the new BagIt file to the staging location.
deposit_to_pln
Create a SWORD deposit on a LOCKSSOMatic instance for the staged BagIt file.
check_status
Check the status of the deposit on the LOCKSSOMatic instance.

Services are run via pln-service.py.

usage: pln-service.py [-h] [-v | -q] [-n | -f] [-d DEPOSIT] service

Run a staging service

positional arguments:
  service               Name of the service to run

optional arguments:
  -h, --help            show this help message and exit
  -v, --verbose         Increase output verbosity
  -q, --quiet           Silence most output
  -n, --dry-run         Do not update the deposit states
  -f, --force           Force updates to the deposit states.
  -d DEPOSIT, --deposit DEPOSIT
                        Run the service on one or more deposits

Commands

There are a number of convenience commands for querying the list of deposits. They are:

journal_history
Show all deposits for a journal.
journal_info
Show metadata for a journal.
list_commands
List the available commands.
list_deposits
List all deposits.
list_journals
List all the journals that have ever made a deposit.
list_services
List all the services in the order they are applied to a deposit.
process
Process one deposit through all the services in the appropriate order
reset_deposit
Reset a deposit to a processing stage
service_log
Show all service actions against a deposit.
usage: pln-command.py [-h] [-v | -q] command ...

Run a staging command

positional arguments:
  command        Name of the command to run
  subargs        Arugments to subcommand

optional arguments:
  -h, --help     show this help message and exit
  -v, --verbose  Increase output verbosity
  -q, --quiet    Silence most output

Use pln-command.py list_commands for a list of available commands

All commands accept -h/--help as an argument:

$ ./pln-command.py journal_info --help
usage: pln-command.py [global options] journal_info [command options]

Report all known journal metadata.

positional arguments:
  uuid        Journal UUID

optional arguments:
  -h, --help  show this help message and exit