This project creates the Dutch Digital Heritage Network virtual digital preservation research environment. The environment is a virtual machine set up with a set of digital preservation tools installed and ready to use from the desktop. The supporting documentation has been broken down into 3 distinct areas:
- Quick Start
- VRE User Guide
- Reference links
This prototype comes with four open source digital preservation tools installed. These were selected for ease of use, they all have graphical user interfaces, and homogeneity as they're all Java based.
- DROID: A file format identification tool developed and maintained by The National Archives of the UK
- JHOVE: A format validation and characterisation tool developed by Harvard University Library and the Open Preservation Foundation
- Homepage: https://jhove.openpreservation.org
- GitHub: https://github.com/openpreserve/jhove
- Apache Tika: A characterisation and text extraction tool developed and maintained by the Apache Software Foundation
- Homepage: https://tika.apache.org/
- GitHub: https://github.com/apache/tika
- veraPDF: A validation and characterisation tool for the PDF/A format
- Homepage: https://docs.verapdf.org
- GitHub: https://github.com/veraPDF/veraPDF-library
- Handbrake: a tool for converting video from nearly any format to a selection of modern, widely supported codecs.
- Homepage: https://handbrake.fr/
- GitHub: https://github.com/HandBrake/HandBrake
The quickest way to try out the environment is to download the machine image.
You'll need Virtual Box on your machine to act as a virtualisation platform. If you're installing VirtualBox:
- Check that you have hardware virtualisation enabled in your BIOS.
- Please install the Extension Pack.
Rather than build a vagrant machine you can download a prebuilt OVF file
which can be downloaded [VREv1.0] (https://ddhn.openpreservation.org/ddhn-rc.ov). The download takes some time
as it's about 4GB. When it's finished you should have a file called ddhn-rc.ova
.
These instructions tell you how to import the OVA file into VirtualBox so you can start it.
- Username:
vagrant
- Password:
vagrant
We're currently using the default vagrant
account used to create the virtual machine
image. The user name is configurable from the Ansible build variables. The machine should
automatically log in with these credentials but if you need to log in then the password
is also vagrant
. The account also has passwordless sudo
privileges so can be used
to fix, or break, most things. This may change when we release a production version.
Out of the box the machine should come configured with:
- 2 virtual CPUS
- 64MB of video RAM to allow desktop scaling
- 4GB of RAM
More CPU and RAM will almost certainly improve performance. If you're setting up a vagrant box from scratch you can use the Initialisation instructions to change the parameters. If you've imported the OVA you can use the VirtualBox GUI to make the changes as described here.
There are a few flavours of VM for a particular OS. The project team have already agreed that Debian 9 (Stretch) was a sensible starting choice. The two main criteria that guided the decision were stability and long update cycles.
Virtual Box was chosen as the virtualisation platform because of its cross platform ubiquity. Vagrant is a tool designed for building and managing virtual machine environments. It was chosen to speed up the initial virtual box provisioning. Vagrant Cloud provides a collection of cookie-cut virtual machines. The Vagrant machine chosen as a starting point was an official Debian Stretch build with the addition of the Virtual Box shared folder kernel module: https://app.vagrantup.com/debian/boxes/contrib-stretch64.
A vagrant machine is configured by a Vagrantfile
which can be
set up with the appropriate virtual machine template:
vagrant init debian/contrib-stretch64
Before starting the machine we want to configure a few things out of the box. By default Vagrant machines are headless, i.e. all access via terminal and SSH with no GUI. We also need to provision the memory and number of CPUs available to the machine. While cores and memory are plentiful on a development workstation, 2 virtual CPUs and 4GB or RAM are sensible starting parameters. Anything requiring significantly more compute power would struggle to satisfy the accessible research environment brief. These parameters can be adjusted in situ regardless.
We can set these up for a Virtual Box VM by adding the following lines to our Vagrantfile, we'll also set a VM name while we're at it:
config.vm.provider "virtualbox" do |vb|
# Name the prototype machine
vb.name = "DDHN Prototype"
# Display the VirtualBox GUI when booting the machine
vb.gui = true
# Customize the CPUs (2x) and memory (4GB) on the VM:
vb.cpus = 2
vb.memory = "4096"
# Now set an execution cap at 50 % if required
# vb.customize ["modifyvm", :id, "--cpuexecutioncap", "50"]
# We need extra Video RAM for display flexibility
vb.customize ["modifyvm", :id, "--vram", "64"]
end
We can now bring the machine up with the command vagrant up
, this takes a while first time.
Provisioning covers installation of the software tools and dependencies as well as configuration of the OS and user environment. Ansible is a cross platform IT automation tool that simply requires SSH access to the target machine.
Vagrant features built in support for Ansible provisioning out of the box. The following section of the Vagrantfile
invokes Ansible
provisioning the first time that the VM is started using the vagrant up
command. After first start the provisioning section can be invoked alone by using the vagrant provision
command. The Vagrantfile
section looks like:
config.vm.provision "ansible" do |ansible|
# Use the playbook ./ansible/initialise-env.yaml
ansible.playbook = "ansible/initialise-env.yml"
# Let's ask for verbose output in case of problems
ansible.verbose = "vv"
# Limit the use of this playbok to a particular host
ansible.limit = "env.ddhn.test"
# Ansible job requirements, we need NGINX
ansible.galaxy_role_file = "ansible/requirements.yml"
ansible.galaxy_command = "ansible-galaxy install --role-file=%{role_file}"
# The inventory file that sets up details for the vagrant machine
ansible.inventory_path = "ansible/vagrant.yml"
end
The playbook ansible/initialise-env.yaml
is the list of roles that set up the virtual research environment. An Ansible role is simply a set of tasks that achieve a desired state, e.g. install software, copy files, etc.. The next two sections break down the sub-roles describing the general steps taken and the rationale.
The ddhn.setup
role handles the setup of the environment, updating the OS, installing dependencies, creating accounts and the like. The main role simply calls four sub-roles.
The 'server.yml' sub-role:
- updates apt packages;
- sets up the hostname; and
- sets the timezone.
The prerequisites.yml
sub-role installs any apt package dependencies. The package list is the ddhn_env_apt_defaults
variable in the roles' main default file.
The user.yml
sub-role creates a sudo user to administer the environment. Again, the task is configurable using variables in the roles' main default file.
The security
role hardens SSH access, no password and no root access, while setting up firewall rules. The thinking is that the environment should be secure with port access only opened where required.
The ddhn.tools
role installs the digital preservation tools. It comprises a series of sub-roles, one for each tool. The general workflow for a tool is:
- download the tool source to '/usr/local/src/';
- download the tool installation package and install to
/usr/local/lib/<tool-name>
; - add any required symlinks to
/usr/local/bin
so that tool executables are effectively on the path; and - put an icon for the tool GUI on the desktop.
VRE is a pre-configured virtual machine environment with an installed set of digital preservation (DP) tools for use directly from your desktop. The available DP tools are:
- Jhove
- fido
- veraPDF
- Jypylyzer
- Handbrake (Ffmpeg)
In order to use the VRE you will need to download software that will create the virtual environment, the VRE image itself and you may also need to check that your desktop has been set up to support virtualisation.
If you have a system administrator, ask them to check whether the BIOS settings on your desktop has been set up for virtualisation. If not then this is done at startup, the BIOS setting for virtualisation needs to be enabled. The process for doing this varies so refer to the manufacturer's instructions as needed. Resources - enabling virtualisation example
VRE has been built and tested using Oracle Virtual Box. Virtual Box can be installed across several O/S including Windows (NT 4.0, 2000, XP, Server 2003, Vista, Windows 7, Windows 8, Windows 10), DOS/Windows 3.x, Linux (2.4, 2.6, 3.x and 4.x), Solaris and OpenSolaris, OS/2, and OpenBSD. Note VRE has been baselined to work with Oracle Virtual Box v6.1.1.6
The Virtual Box download links that both need to be installed can be accessed via the following links: Virtual Box download download and the accompanying Virtual Box Extention download. These links will take you to the Oracle Virtual Box Download page. From here select the relevant Virtual Box platform package, the options are:
- Windows
- OS X
- Linux
- Solaris
Selection will initiate the download process. When complete initiate the Virtual Box Extension pack download.
Resources -
- Virtual Box OS support OS Support
- Virtual Box install instructions Installation instructions
- Virtual Box download
- Virtual Box Extention download
VRE is downloaded as a single machine image as a prebuilt OVA file. The most current version can be downloaded via the following link: VREv1.0. Note this is a 4GB file and is likely to take several minutes to download.