/snpseq-archive-verify

REST service for verifying uploaded archives

Primary LanguagePythonMIT LicenseMIT

SNPSEQ Archive Verify

A self contained (aiohttp) REST service that helps verify uploaded SNP&SEQ archives by first downloading the archive from PDC, and then compare the MD5 sums for all associated files. The downloaded files are deleted on successful verification, and retained if any error occurs.

The service is composed of 3 components which must all be running and the system must be set up to allow the services to communicate over the configured protocols and ports (refer to the redis documentation for details):

  • archive-verify-ws REST service
  • Redis queue server
  • RQ worker

The web service enqueues certain job functions in the RQ/Redis queue, where they get picked up by the separate RQ worker process.

Pre-requisites

You will need python >=3.9 and redis.

Download and install redis

Install

It is recommended to set up the service in a virtual environment. venv is used below with bash on a Linux system.

python3 -m venv --upgrade-deps .venv
source .venv/bin/activate
pip install .

Running the service

Start the Redis server and RQ worker:

redis-server
rq worker

Start the REST service

archive-verify-ws -c=config/

Mock Downloading

If you are running this service locally and don't have IBM's dsmc client installed, you can skip the downloading step and verify an archive that is already on your machine.

To use this method:

  • copy an archive that has been pre-downloaded from PDC into the verify_root_dir set in app.yaml

  • Delete or edit some files from the archive if you wish to trigger a validation error.

  • in app.yml, set:

    pdc_client: "MockPdcClient"

Note that the archive will be deleted from verify_root_dir on successful verification

Naming Conventions for Mock Download

Note that when an archive is downloaded from PDC using snpseq-archive-verify, the downloaded directory is formatted with the name of the archive plus the RQ job id, like so:

{verify_root_dir}/{archive_name}_{rq_job_id}

When mocking downloading, we search verify_root_dir for archive_name and use the first directory found, ignoring the rq_job_id.

Running tests

source .venv/bin/activate
pip install -e .[test]
nosetests tests/

REST endpoints

Enqueue a verification job of a specific archive:

curl -i -X "POST" -d '{"host": "my-host", "description": "my-descr", "archive": "my_001XBC_archive"}' http://localhost:8989/api/1.0/verify

Enqueue a download job of a specific archive:

curl -i -X "POST" -d '{"host": "my-host", "description": "my-descr", "archive": "my_001XBC_archive"}' http://localhost:8989/api/1.0/download

Check the current status of an enqueued job:

curl -i -X "GET" http://localhost:8989/api/1.0/status/<job-uuid-returned-from-verify-endpoint>