Run this script after performing a successful batch ingest. It will need two directory paths--one to the local working directory where original metadata-n.csv files are stored, etc., and one to the appropriate directory within /success/ on the batch ingester. It backs up original files and creates appropriate files for updating the objects in future through the batch ingester.
When updating something in the batch ingester, one needs two things:
- A copy of the original submission CSV with the addition of the field "curate_id". (Also with any files removed, see Necessary follow-up steps)
- The metadata-n.rof file(s) and a JOB file which re-add the properties field including the thumbnails to the object.
One then submits the two as two separate ingests in order.
The script does the following:
- Creates directory
/originals
in local working directory if doesn't already exist. - Retrieves successful ROFs from remote directory into working directory and renames them with prefix
original-
. - Copies original CSVs from the working directory into directory
/originals
, renames them with prefixoriginal-
. This means we'll always have a copy of the original even as we make updates. - Walks through the ROF files and extracts all non-GenericFile PIDs as a CSV with the header
curate_id
in a file namedpid-n.csv
.n
inpid-n
refers to the original CSV/ROF ingest and all PIDs are in the same order as objects in the CSV and can simply be added tometadata-n.csv
as a new column. - Walks through the ROF files and creates ROFs without the prefix
original-
that have the necessary information to update thumbnails. Creates directory called/update-thumbnails
if it doesn't exist, and moves these there. - Moves the
original-metadata-n.rof
files to the/originals
directory.
- Open
metadata-n.csv
andpid-n.csv
files. - Remove
files
column frommetadata-n.csv
(this is one reason we kept a copy of the originals). - Add column
curate_id
frompid-n.csv
tometadata-n.csv
and save. The ordering of the two is the same and thus no extra reconciliation is needed.
Runs on Python 2.7.11. Requires installation of Python jq library (possible via Homebrew) before it can be run.
Add support for this being the second or third ingest, running only the process to copy the mid-process CSV files into a directory and keep the most recent copy of the CSV in the main file.
Add support for updating Generic Files. This would involve developing the appropriate JQ to get just the Generic Files and, importantly, obtaining the "content_file" field which has the actual file name in it. Updating Generic Files is still largely untested at this stage as it's more often the parent object metadata which gets augmented.