caltechlibrary/dibsiiif

dibsiiif should process one barcode at a time

Opened this issue · 0 comments

The concurrency problem

There is a problem with the current version of iiifify.sh. The problem is that:

  • it is intended to be run concurrently (thus the “run once a minute in cron” instruction)
  • its for-loop builds a list of multiple barcodes, but it only can process each barcode one at a time

This leads dibsiiif.py to throw errors in cases where it shouldn't. Assume you have 3 books to process and it takes longer than a minute to process a book. Now suppose one iiifify.sh job runs in cron and decides to loop over books 1, 2, and 3, and a minute later a second iiifify.sh job runs in cron and decides to loop over books 2 and 3.

When the first job finishes book 1 and tries to start book 2, dibsiiif.py will throw an error (because the 2-processing file already exists) and it will create a 2-problem file, which means that users of the app will see a red exclamation mark icon in the dibs item listing for book 2. The first job will then move on to book 3 (assuming no other job has picked up book 3 by now).

When the second job finishes book 2 and tries to start book 3, some other job may have already created the 3-processing file. In that situation, dibsiiif.py will throw an error (because the 3-processing file already exists) and create a 3-problem file, which means that users of the app will see a red exclamation mark icon in the dibs item listing for book 3.

As a result of this, users of the dibs web app then need to ask their system administrators to go in and remove the problem and processing files for those items in order to proceed.

The concurrency solution

We propose the following changes to the way dibsiiif processes multiple barcodes, in a PR coming soon:

  • each invocation of iiifify.sh via cron processes only a single barcode directory
  • the single barcode chosen will be the oldest barcode-initiated file by timestamp in the status/files location to ensure that the directories are processed in the order requested instead of alphabetically