/ace-audit-manager

Docker image of ace-audit-manager used to do regular fixity checking of a storage archive.

Primary LanguageShellApache License 2.0Apache-2.0

ACE Audit Manager docker

Introduction

Ace Audit Manager (ace-am) runs fixity checks on archival storage. It is designed to integrate with the Ace Integrity Management Service (ace-ims), which taken together form the ACE Auditing Control Environment

This docker image is part of a set created by the University of Arizona Libraries to implement open fixity solutions based on Ace Audit Manager

Background

When looking over the ACE architecture the ace-am's role is to provide fixity auditing of an archival collection, generating checksums of files to enable the auditing. The role of the ace-ims is to prove the checksum's generated by an audit manager have not gotten corrupted.

There is usually many deployments of ACE Audit Manager which all connect to the same ACE Integrity Management Service.

This docker image was created via the manual instructions for setting up the ACE audit manager. The source code for the entire ACE suite is hosted on gitlab

A good comparison chart between different archival storage systems is provided by digitalpowrr.

Dependencies

Host system dependencies

  1. docker-compose is installed on the system.
  2. The host system's time synchronized with a master ntp server.
  3. No other service on the system is listening at port 8080.
  4. Archival storage directories located or mounted under /mnt. This is changeable via the ACE_AUDIT_SHARES environment variable.

External dependencies

  1. An mysql variant database server to persist checksum values and runtime data. Connection to the database is controlled through docker environment variables. Note mysql is not a pre-requisite of ace-am, but of this docker image which has the mysql jdbc driver pre-installed.
  2. An smtp server to send emails. The smtp server settings are set by clicking on the "System Settings" top right link after ace-am is up and running.
  3. An ace integrity management system to send audit tokens. The public default of ims.umiacs.umd.edu:8080 will be used if an over-ride is not specified in the "System Settings" ace-am configuration page.

Environment Variables

The following environment variables control the docker setup:

  • ACE_AUDIT_SHARES - host directory containing archival content to mount into ace-am docker container which it can be setup to audit. Defaults to /mnt
  • ACE_AM_DATABASE - the database name to connect to on the database system, defaults to 'aceamdb'.
  • ACE_AMDB_HOST - the database system hostname to connect to, defaults to 'db-host'
  • ACE_AMDB_PORT - the database system port to connect to, defaults '3306'
  • ACE_AMDBA_USER - the database user account to connect with, defaults to 'aceam'
  • ACE_AMDBA_PASSWORD - the database user password to connect with, defaults to 'ace'.
  • ACE_AM_BOOTSTRAP_SLEEP - on the first time startup of the container, the number of seconds to wait for a docker database container to complete bootstrapping, defaults to 45 seconds. When an external database is being used, this variable can be set to 0.

Deployment

There are a couple docker-compose deployments provided:

  1. Self contained
  2. Singleton

Self contained

A docker-compose example integrating with a mysql docker container is located at compose/fixity-db. If the public ace-ims service ims.umiacs.umd.edu:8080 is used and an smtp service is available then this docker-compose provides a way to quickly install and try out ace-am.

To test out ACE Audit Manager, run the following commands:


 git clone https://github.com/ualibraries/ace-audit-manager.git
 cd ace-audit-manager/compose/fixity-db
 docker-compose up -d

Then browse to http://localhost:8080/ace-am

After getting it up and running, follow the 3. Register your first collection instructions. The docker image mounts /mnt from the host into to docker container, so any shares that have been mounted underneath this will be available to create a collection with for fixity auditing.

To cleanup the above test instance, run:


 git clone https://github.com/ualibraries/ace-audit-manager.git
 cd ace-audit-manager/compose/fixity
 docker-compose rm -fsv
 docker volume prune  # Enter y

Two docker containers will be created, validate by running docker ps -a

  • fixitydb_audit_1 - contains ace audit manager running under tomcat
  • fixitydb_db-host_1 - contains a mysql database used by ace audit manager.

Singleton

The singleton docker-compose example located at compose/fixity just installs the ace-am by itself, so it requires an external database and ace-ims to connect to.

This docker-compose example is more likely to be used in a production environment where there is a dedicated database and ace-ims machine that are used by a number of ace-am machines.

Scripts

Ace Audit manager is quite feature-full, however some scripts have been written to help work with huge 10+ terrabyte archival storages in the scripts directory.

Ace-ims can generate a duplicate file report by clicking on a collection, then clicking on the more...->Show Duplicate Files menu item. However, this times out for huge archives, so a workaround is to download the checksum list report via more...->Download checkm list and on the command line run the parse-checksum-duplicates.pl script against the checksum list via:


 cat Summary.txt | ./parse-checksum-duplicates.pl > duplicates.txt

The duplicate files are organized by their common checksum value. This script requires that perl is installed and in the PATH environment variable list.

At the end of the report are two summations:

  • DUPLICATES_FILES - contains the total count of files that should get removed so that there are no more duplicates in the collection.
  • DUPLICATES_TOTAL - contains the number of duplicate sets, ie files that all have the same checksum, within the collection.

Known Issues

Currently it is known that ACE has performance issues in regards to finding missing files during peer comparison. If a large number of files in a collection are deleted on one ACE server and not the other and an audit is run on that collection from the server that has all the files with a peer comparison happening on the server with the missing files, it will process around 3 files per minute rendering the audit unable to complete in a reasonable time frame and therefore useless in such a case. Current requirements for digital preservation need audit's to be run every 90 days.

Two ways to bypass this are:

  1. Make sure when files are deleted from one ACE server they are deleted from both, which would be the proper way to meet digital preservation requirements.
  2. Not have peer comparison happen, which leads to audits showing no errors on the ACE server with the files deleted.

Option 1 is the preferred method at this time