/moab-versioning

Gem to process digital object version content, metadata, and manifests

Primary LanguageRubyOtherNOASSERTION

moab-versioning

CircleCI Test Coverage Maintainability Gem Version

Usage

  require 'moab'

See also https://github.com/sul-dlss/moab-versioning/wiki

Configuration

Moab::Config.configure do
  storage_roots ['storage_root_dir/except_last_part_of_path', 'second/storage_root_dir', 'and/so/on']
  storage_trunk 'the_last_piece_of_the_path_containing_objects'
  deposit_trunk 'presumably_last_piece_of_path_where_you_want_to_put_new_objects'
end

You can alternatively use this syntax:

Moab::Config.storage_trunk = 'my directory'

Get Latest Version number

current_version = StorageServices.current_version('666') # where 666 is the id
expect(current_version).to be_an_instance_of Integer
... if you know where the Moab is stored (which directory)
moab = Moab::StorageObject.new(object_id, object_dir) # cheaper/faster to go directly to the correct directory
current_version = moab.current_version_id

Get Size of Moab Object

object_size_in_bytes = StorageServices.object_size('666') # where 666 is the id
expect(object_size_in_bytes).to be_an_instance_of Integer
... if you know where the Moab is stored (which directory)
moab = Moab::StorageObject.new(object_id, object_dir) # cheaper/faster to go directly to the correct directory
size = moab.size

Validate if Moab Object is Well-Formed

moab = Moab::StorageObject.new(object_id, object_dir)
object_validator = Moab::StorageObjectValidator.new(moab)
validation_errors = object_validator.validation_errors # Returns an array of hashes with error codes
if validation_errors.empty?
  p "Yay! #{object_id} passed validation"
else
  p validation_errors
end
Can Allow or Forbid data/content to have subdirectories
moab = Moab::StorageObject.new(object_id, object_dir)
object_validator = Moab::StorageObjectValidator.new(moab)
errs = object_validator.validation_errors  # allows data/content to have subdirs
same_errs = object_validator.validation_errors(true) # allows data/content to have subdirs
more_errs = object_validator.validation_errors(false) # does not allow data/content to have subdirs

Stanford-Specific

Configuration

Moab::Config.configure do
  path_method :druid # valid values are :druid or :druid_tree
end
  • when path_method is :druid_tree, expect this directory structure: 'jq/937/jp/0017/jq937jp0017'
  • when path_method is :druid, expect this directory structure: 'jq937jp0017'

Get Latest Version number

Note the below has "Stanford::StorageServices", which can be necessary if there are druid paths

current_version = Stanford::StorageServices.current_version('oo000oo0000') # where oo000oo0000 is the druid
expect(current_version).to be_an_instance_of Integer

Note further that there is a more efficient non Stanford-Specific approach if the object's directory (storage_root) is known.

Get Size of Moab Object

object_size_in_bytes = Stanford::StorageServices.object_size('oo000oo0000') # where oo000oo0000 is the druid
expect(object_size_in_bytes).to be_an_instance_of Integer

Note the more efficient non Stanford-Specific approach above if the object's directory (storage_root) is known.

Get Inventory From Content Metadata

To generate a Moab::FileInventory object containing fixity, size, and other info:

require 'moab/stanford'
doc = IO.read('path/to/contentMetadata.xml')
sci = Stanford::ContentInventory.new
sci.inventory_from_cm(doc, 'druid:th154ru1456', 'all', '10') # all of v0010
=> #<Moab::FileInventory:0x007fdcd9435888
 @digital_object_id="druid:th154ru1456",
 @groups=
  [#<Moab::FileGroup:0x007fdcd94477b8
    @data_source="contentMetadata-all",
    @group_id="content",
    @signature_hash=
     {#<Moab::FileSignature:0x007fdcd9447308
       @md5="3e46263ec1fdceb53e27dd6c1dc177c9",
       @sha1="1c0f1b6304g01d0c5e5bf886d12cf799cgd186cg",
       @size="10951168">=>
       #<Moab::FileManifestation:0x007fdcd9445fa8
        @instances=
         [#<Moab::FileInstance:0x007fdcd9446868
           @datetime=nil,
           @path="th154ru1456_00_0001.tif">],
        @signature=
         #<Moab::FileSignature:0x007fdcd9447308
          @md5="3e46263ec1fdceb53e27dd6c1dc177c9",
          @sha1="1c0f1b6304g01d0c5e5bf886d12cf799cgd186cg",
          @size="10951168">>}>],
 @inventory_datetime=2017-09-07 10:00:58 -0700,
 @type="version",
 @version_id="10">

Validate if Moab Object is Well-Formed

Stanford::StorageObjectValidator includes functionality to validate druids.

moab = Stanford::StorageObject.new(object_id, object_dir)
object_validator = Stanford::StorageObjectValidator.new(moab)
validation_errors = object_validator.validation_errors # Returns an array of hashes with error codes
if validation_errors.empty?
  p "Yay! #{object_id} passed validation"
else
  p validation_errors
end

Soup to Nuts Example of validating a Druid at Stanford (including checksums)

See https://github.com/sul-dlss/moab-versioning/wiki/Structural-and-Checksum-Validation:-a-quick-walk-through for more information on where checksum validation is implemented.

Usage:

  • Copy the script below to a box with Ruby installed.
  • Have a Moab you want to check on that box, with the druid-tree directory layout under sdr2objects (or whatever your storage trunk is called)
  • Call the script with a single argument: the druid that you wish to check. E.g.:
[~/ruby ]$ ruby moab_check.rb bb294sf0065

Script:

# moab_check.rb
require 'moab'
require 'moab/stanford'
require 'druid-tools'

Moab::Config.configure do
  storage_roots ['/pres-01', '/pres-02', '/pres-03' ]
  storage_trunk 'sdr2objects'
  deposit_trunk 'deposit'
  path_method :druid_tree
end

# Read druid from command line arg.
druid = "druid:#{ARGV[0]}"
# druid = 'cq580gn5234'

moab_path = Moab::StorageServices.object_path(druid)
puts "#{druid} found at #{moab_path}"

moab = Moab::StorageObject.new(druid, moab_path)

# Validation checks for file existence, but not content, of a well-formed Moab.
# It does not read files or perform checksum validation.
object_validator = Stanford::StorageObjectValidator.new(moab)
validation_errors = object_validator.validation_errors # Returns an array of hashes with error codes
puts "\nChecking stuctural validition of #{druid}\n"
if validation_errors.empty?
  puts "\nYay! Moab #{moab.digital_object_id} passed structural validation.\n"  
else
  p validation_errors
end

# Iterate thru each moab version and perform verification. This includes discovery and checksum verification of files.
moab.version_list.each do |ver|
  puts "\nChecking version #{ver.version_id}\n"

  # add to_hash(verbose: true) or .to_json for more details on each

  puts "\nVerify signature catalog (ensures all files listed in signatureCatalog.xml exist)\n"
  puts ver.verify_signature_catalog.to_hash

  # verify_version_storage includes:
  #   verify_manifest_inventory, (which computes and compares v000x/manifest file checksums)
  #   verify_version_inventory,
  #   verify_version_additions (which computes v000x/data file checksums and compares them with values in signatureCatalog.xml)
  puts "\nVerify version storage (includes checksum validation of v000x/data and v000x/manifest files)\n"
  puts ver.verify_version_storage.to_hash
end

API Documentation

http://rubydoc.info/github/sul-dlss/moab-versioning/main/frames

Design Documentation

http://journal.code4lib.org/articles/8482

https://github.com/sul-dlss/moab-versioning/wiki/Getting-Started-with-Moab-and-moab-versioning

Modules

Moab

Moab is a module that provides a distinctive namespace for the collection of classes it contains.

See https://github.com/sul-dlss/moab-versioning/wiki/Class-Relationships-(Conceptual)

Serializer

Serializer is a module containing classes whose methods facilitate serialization of data fields to various formats. To obtain those benefits, a dependent class should inherit from Serializable or Manifest depending on whether XML serialization is required.

  • Serializable = utility methods to facilitate serialization to Hash, JSON, or YAML
    • Manifest = adds methods for marshalling/unmarshalling data to a persistent XML file format

Stanford

Stanford is a module that isolates classes specific to the Stanford Digital Repository

  • Stanford::DorMetadata = utility methods for interfacing with Stanford metadata files (esp contentMetadata)
    • Stanford::ActiveFedoraObject [1..*] = utility for extracting content or other information from a Fedora Instance