DIP Access Interface

Overview
Data model
Uploading new DIPs
User types and permissions
Installation
- Requirements
- Environment
- Setup
- Serve
Development

Overview

The DIP Access Interface (real name pending) is a Django project designed to provide access to Dissemination Information Packages (i.e., access copies of digital files stored in Archival Information Packages in CCA's Archivematica-based digital repository). The project is specifically designed around the custom DIPs generated by create-dip.py via Artefactual's Automation Tools.

The primary application, "dips", allows users to add, organize, and interact with these DIPs and the digital files they contain, depending on user permissions.

User stories

Data model

The application organizes and displays information in several levels:

Collection: A Collection is the highest level of organization and corresponds to an archive or other assembled collection of materials. A Collection has 0 to many Folders as children (in practice, every collection should have at least one child, but this is not enforced by the application). Collections may also have unqualified Dublin Core descriptive metadata, as well as a link to a finding aid.
Folder: A Folder corresponds 1:1 with a Dissemination Information Package (DIP). A Folder has 1 to many Digital Files as children, which are auto-generated from information in the AIP METS file included as part of the CCA-style DIP. Folders may also have unqualified Dublin Core metadata. The DC metadata from the most recently updated dmdSec is written into the Folder record when the METS file is uploaded (except "ispartof", which is hard-coded on creation of the Folder. This might be something to change for more generalized usage).
Digital File: A Digital File corresponds to a description of an original digital file in the AIP METS file and contains detailed metadata from an AIP METS file amdSec, including a list of PREMIS events. Digital Files should never be created manually, but only generated via parsing of the METS file when a new Folder is added.

Uploading new DIPs

When a sufficiently privileged user creates a new Folder through the GUI interface, they need only enter the identifier, choose the Collection to which the Folder belongs, and upload a copy of the zipped digital objects from the CCA-style DIP to upload. The application then uses the parsemets.py script to parse the AIP METS file included in the DIP, automatically:

Saving Dublin Core metadata found in the (most recently updated) dmdSec to the DIP model object for the Folder
Generating records for Digital Files and the PREMIS events associated with each digital file and saving them to the database.

In a future version of the application, it should be possible to upload a new DIP via a (not yet existing) REST API, which will similarly populate the database from the METS file.

Once the DIP has been uploaded, the metadata for the Folder can be edited through the GUI by any user with sufficient permissions.

User types and permissions

By default, the application has five levels of permissions:

Administrators: Administrators have access to all parts of the application.
Managers: Users in this group can manage users but not make them administrators.
Editors: Users in this group can add and edit Collections and Folders but not delete them.
Public: Users with a username/password but no additional permissions have view-only access.
Unauthenticated: Not logged in users can only access the FAQ and login pages.

For more information check the user management and permissions feature file.

Installation

The following steps are just an example of how to run the application in a production environment over Ubuntu 16.04.

Requirements

Python 3.4 or higher
Elasticsearch 6.x

Environment

The following environment variables are used to run the application:

ES_HOSTS [REQUIRED]: List of Elasticsearch hosts separated by comma. RFC-1738 formatted URLs can be used. E.g.:https://user:secret@host:443/.
ES_TIMEOUT: Timeout in seconds for Elasticsearch requests. Default: 10.
ES_POOL_SIZE: Elasticsearch requests pool size. Default: 10.
ES_INDEXES_SHARDS: Number of shards for Elasticsearch indexes. Default: 1.
ES_INDEXES_REPLICAS: Number of replicas for Elasticsearch indexes. Default: 0.

Setup

As the root user, install pip and virtualenv:

apt-get update
apt-get upgrade
apt-get install gcc python3-dev
wget https://bootstrap.pypa.io/get-pip.py
python3 get-pip.py
rm get-pip.py
pip install virtualenv

And install Java 8 and Elasticsearch:

apt-get install apt-transport-https openjdk-8-jre
wget -qO - https://artifacts.elastic.co/GPG-KEY-elasticsearch | sudo apt-key add -
echo "deb https://artifacts.elastic.co/packages/6.x/apt stable main" | sudo tee -a /etc/apt/sources.list.d/elastic-6.x.list
apt-get update
apt-get install elasticsearch
systemctl daemon-reload
systemctl start elasticsearch
systemctl enable elasticsearch

Verify Elasticsearch is running:

curl -XGET http://localhost:9200

{
  "name" : "ofgAtrJ",
  "cluster_name" : "elasticsearch",
  "cluster_uuid" : "3h9xSrVlRJmDHgQ8FLnByA",
  "version" : {
    "number" : "6.3.0",
    "build_hash" : "db0d481",
    "build_date" : "2017-02-09T22:05:32.386Z",
    "build_snapshot" : false,
    "lucene_version" : "6.4.1"
  },
  "tagline" : "You Know, for Search"
}

Create user to own and run the application, log in and make sure you're placed in its home folder:

adduser accesspoc
su - accesspoc
cd ~

Create an environment file in ~/accesspoc-env, at least with the required variable, to reference it where it's needed, for example:

ES_HOSTS=localhost:9200

Clone the repository and go to its directory:

git clone https://github.com/CCA-Public/dip-access-interface
cd dip-access-interface

Until different settings files are added and this is added to the environment, you'll need to edit accesspoc/accesspoc/settings.py to add the host IP or domain address to the ALLOWED_HOSTS variable. E.g.:

ALLOWED_HOSTS = ['example.com']

Create a Python virtual environment and install the application requirements:

virtualenv venv -p python3  
source venv/bin/activate  
pip install -r requirements/production.txt

Export the environment for the manage.py commands:

export $(cat ~/accesspoc-env)

Initialize the database:

accesspoc/manage.py migrate

Create search indexes:

accesspoc/manage.py index_data

Add a superuser:

accesspoc/manage.py createsuperuser

Follow the instructions to create a user with full admin rights.

You can now deactivate the environment and go back to the root session:

deactivate && exit

Serve

The application requirements install Gunicorn and WhiteNoise to serve the application, including the static files. Back as the root user, make a systemd service file to run the Gunicorn daemon in /etc/systemd/system/accesspoc-gunicorn.service, with the following content:

[Unit]
Description=Accesspoc Gunicorn daemon
After=network.target

[Service]
User=accesspoc
Group=accesspoc
PrivateTmp=true
PIDFile=/home/accesspoc/accesspoc-gunicorn.pid
EnvironmentFile=/home/accesspoc/accesspoc-env
WorkingDirectory=/home/accesspoc/dip-access-interface/accesspoc
ExecStart=/home/accesspoc/dip-access-interface/venv/bin/gunicorn \
            --access-logfile /dev/null \
            --worker-class gevent \
            --workers 4 \
            --bind unix:/home/accesspoc/accesspoc-gunicorn.sock \
            accesspoc.wsgi:application
ExecReload=/bin/kill -s HUP $MAINPID
ExecStop=/bin/kill -s TERM $MAINPID

[Install]
WantedBy=multi-user.target

Start and enable the service:

systemctl start accesspoc-gunicorn
systemctl enable accesspoc-gunicorn

To access the service logs, use:

journalctl -u accesspoc-gunicorn

The Gunicorn service is using an Unix socket to listen for connections so we will use Nginx to proxy the application:

apt-get install nginx
nano /etc/nginx/sites-available/accesspoc

With a basic configuration:

upstream accesspoc {
  server unix:/home/accesspoc/accesspoc-gunicorn.sock;
}

server {
  listen 80;
  server_name example.com;
  client_max_body_size 500M;

  location / {
    proxy_set_header Host $http_host;
    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    proxy_redirect off;
    proxy_buffering off;
    proxy_pass http://accesspoc;
  }
}

Link the site configuration to sites-enabled and remove the default configuration:

ln -s /etc/nginx/sites-available/accesspoc /etc/nginx/sites-enabled
rm /etc/nginx/sites-available/default

Verify configuration and restart Nginx service:

nginx -t
systemctl restart nginx

Development

Requires Docker CE and Docker Compose.

Clone the repository and go to its directory:

git clone https://github.com/CCA-Public/dip-access-interface
cd dip-access-interface

Build images, initialize services, etc.:

docker-compose up -d

Initialize database:

docker-compose exec accesspoc ./manage.py migrate

Create search indexes:

docker-compose exec accesspoc ./manage.py index_data

Add a superuser:

docker-compose exec accesspoc ./manage.py createsuperuser

Follow the instructions to create a user with full admin rights.

To maintain the Docker image as small as possible, the build dependencies needed are removed after installing the requirements. Therefore, executing tox inside the container will fail installing those requirements. If you don't have Tox installed in the host and need to run the application tests and syntax checks, use one of the following commands to create a one go container to do so:

docker run --rm -t -v `pwd`:/src -w /src python:3.6 /bin/bash -c "pip install tox && tox"
docker run --rm -t -v `pwd`:/app omercnet/tox

Access the logs:

docker-compose logs -f accesspoc elasticsearch

To access the application with the default options visit http://localhost:43430 in the browser.

tw4l/dip-access-interface

DIP Access Interface

Overview

Data model

Uploading new DIPs

User types and permissions

Installation

Requirements

Environment

Setup

Serve

Development