pal-museum-metadata

Metadata validation and packaging tools for Merritt Ingest.

This code will be run from a Cloud9 environment into which the following resources have been loaded.

File structure

an inventory listing of existing tif files residing in S3
- /mrt/inventory/inventory.txt
mods files describing the tif files
- /mrt/mods
temp dir for pulling tif file samples
- /mrt/files
code directory
- /home/ec2-user/environment/code/pal-museum-metadata

This Cloud9 environment is shared by members of the UC3 team.

Therefore, it will be important to not save your github credentials into this working environement.

All of our code will live in a public repository, so it will be easy to pull code into this environment.

git fetch origin main

When you want to save changes back to GitHub, you have a few options

git push origin main

cd ~/environment
python code/pal-museum-metadata/src/scan.py

What becomes a Merritt Object
What identifier(s) will be used
- This will be used for any metadata updates
- What if we get access to the database
What metadata will be stored with the images
What percent objects have / do not have images and metadata
Create Merritt ingest manifest file(s) for each object
- Has identifier(s)
- Has erc descriptive metadata
- Has full file list
  - Url to the mods files
    - Terry will build a web service to make these accessible to the ingest service (done)
  - Url to the image files
    - Terry will build a web service to make these accessible to the ingest process (done)

Analyze match between files in the inventory vs identifiers in mods
- List of matching image and metadata
- List images missing metadata
- List metadata missing images
Recommend local identifier(s) to utilize likely some form of: 0001.02.0001
Map mods fields to Merritt erc
Hand generate a manifest file for a single Pal Museum object (urls depend on where mods and images are served)
- Create ingest manifest for an object with one or more files; supply metadata through Merritt UI
Load hand generated manifest to Merritt stage

Generate list of files per object identifier
Create ingest manifests for objects with one or more files; create a manifest of manifests to supply corresponding metadata

Metadata
- What is the format of our LocalId?
- What mods metadata do we want to publish?
What objects to publish?
- Mods + Image - Yes
- Mods only - ?
- Image only - ? (if a valid id can be created)
- Who consults on this decision?
Additional data
- What is in the new batch of data?
  - Images only?
  - Images and mods?
  - Replacement files?
- Copy to S3
- Modify program to pull new resources
What is in the database?
- Is there unique data in the database that is not already in mods?
- Can we associate this data with an identifier?
Manifest Generation (technical questions)
- Where should the web server run for the image files?
  - images are in S3, served by docker01
  - mods are in S3, served by docker01
- How will ERC metadata be associated with objects?
  - manifest of manifests?
  - erc files?
  - Terry will discuss this with Mark
- How will a batch of individual manifests be published to a url for the ingest process?
  - presume we will copy these into S3. Additional rights will be needed to push to S3
  - Terry will discuss options