The DIP Access Interface (real name pending) is a Django project designed to provide access to Dissemination Information Packages (i.e., access copies of digital files stored in Archival Information Packages in CCA's Archivematica-based digital repository). The project is specifically designed around the custom DIPs generated by create-dip.py via Artefactual's Automation Tools.
The primary application, "dips", allows users to add, organize, and interact with these DIPs and the digital files they contain, depending on user permissions.
The application organizes and displays information in several levels:
- Collection: A Collection is the highest level of organization and corresponds to an archive or other assembled collection of materials. A Collection has 0 to many Folders as children (in practice, every collection should have at least one child, but this is not enforced by the application). Collections may also have unqualified Dublin Core descriptive metadata, as well as a link to a finding aid.
- Folder: A Folder corresponds 1:1 with a Dissemination Information Package (DIP). A Folder has 1 to many Digital Files as children, which are auto-generated from information in the AIP METS file included as part of the CCA-style DIP. Folders may also have unqualified Dublin Core metadata. The DC metadata from the most recently updated dmdSec is written into the Folder record when the METS file is uploaded (except "ispartof", which is hard-coded on creation of the Folder. This might be something to change for more generalized usage).
- Digital File: A Digital File corresponds to a description of an original digital file in the AIP METS file and contains detailed metadata from an AIP METS file amdSec, including a list of PREMIS events. Digital Files should never be created manually, but only generated via parsing of the METS file when a new Folder is added.
When a sufficiently privileged user creates a new Folder through the GUI interface, they need only enter the identifier, choose the Collection to which the Folder belongs, and upload a copy of the zipped digital objects from the CCA-style DIP to upload. The application then uses the parsemets.py
script to parse the AIP METS file included in the DIP, automatically:
- Saving Dublin Core metadata found in the (most recently updated) dmdSec to the DIP model object for the Folder
- Generating records for Digital Files and the PREMIS events associated with each digital file and saving them to the database.
In a future version of the application, it should be possible to upload a new DIP via a (not yet existing) REST API, which will similarly populate the database from the METS file.
Once the DIP has been uploaded, the metadata for the Folder can be edited through the GUI by any user with sufficient permissions.
By default, the application has five levels of permissions:
- Administrators: Administrators have access to all parts of the application.
- Managers: Users in this group can manage users but not make them administrators.
- Editors: Users in this group can add and edit Collections and Folders but not delete them.
- Public: Users with a username/password but no additional permissions have view-only access.
- Unauthenticated: Not logged in users can only access the FAQ and login pages.
For more information check the user management and permissions feature file.
The following steps are just an example of how to run the application in a production environment over Ubuntu 16.04.
- Python 3.4 or higher
- Elasticsearch 6.x
The following environment variables are used to run the application:
ES_HOSTS
[REQUIRED]: List of Elasticsearch hosts separated by comma. RFC-1738 formatted URLs can be used. E.g.:https://user:secret@host:443/
.ES_TIMEOUT
: Timeout in seconds for Elasticsearch requests. Default:10
.ES_POOL_SIZE
: Elasticsearch requests pool size. Default:10
.ES_INDEXES_SHARDS
: Number of shards for Elasticsearch indexes. Default:1
.ES_INDEXES_REPLICAS
: Number of replicas for Elasticsearch indexes. Default:0
.
As the root user, install pip and virtualenv:
apt-get update
apt-get upgrade
apt-get install gcc python3-dev
wget https://bootstrap.pypa.io/get-pip.py
python3 get-pip.py
rm get-pip.py
pip install virtualenv
And install Java 8 and Elasticsearch:
apt-get install apt-transport-https openjdk-8-jre
wget -qO - https://artifacts.elastic.co/GPG-KEY-elasticsearch | sudo apt-key add -
echo "deb https://artifacts.elastic.co/packages/6.x/apt stable main" | sudo tee -a /etc/apt/sources.list.d/elastic-6.x.list
apt-get update
apt-get install elasticsearch
systemctl daemon-reload
systemctl start elasticsearch
systemctl enable elasticsearch
Verify Elasticsearch is running:
curl -XGET http://localhost:9200
{
"name" : "ofgAtrJ",
"cluster_name" : "elasticsearch",
"cluster_uuid" : "3h9xSrVlRJmDHgQ8FLnByA",
"version" : {
"number" : "6.3.0",
"build_hash" : "db0d481",
"build_date" : "2017-02-09T22:05:32.386Z",
"build_snapshot" : false,
"lucene_version" : "6.4.1"
},
"tagline" : "You Know, for Search"
}
Create user to own and run the application, log in and make sure you're placed in its home folder:
adduser accesspoc
su - accesspoc
cd ~
Create an environment file in ~/accesspoc-env
, at least with the required variable, to reference it where it's needed, for example:
ES_HOSTS=localhost:9200
Clone the repository and go to its directory:
git clone https://github.com/CCA-Public/dip-access-interface
cd dip-access-interface
Until different settings files are added and this is added to the environment, you'll need to edit accesspoc/accesspoc/settings.py
to add the host IP or domain address to the ALLOWED_HOSTS
variable. E.g.:
ALLOWED_HOSTS = ['example.com']
Create a Python virtual environment and install the application requirements:
virtualenv venv -p python3
source venv/bin/activate
pip install -r requirements/production.txt
Export the environment for the manage.py
commands:
export $(cat ~/accesspoc-env)
Initialize the database:
accesspoc/manage.py migrate
Create search indexes:
accesspoc/manage.py index_data
Add a superuser:
accesspoc/manage.py createsuperuser
Follow the instructions to create a user with full admin rights.
You can now deactivate the environment and go back to the root session:
deactivate && exit
The application requirements install Gunicorn and WhiteNoise to serve the application, including the static files. Back as the root user, make a systemd service file to run the Gunicorn daemon in /etc/systemd/system/accesspoc-gunicorn.service
, with the following content:
[Unit]
Description=Accesspoc Gunicorn daemon
After=network.target
[Service]
User=accesspoc
Group=accesspoc
PrivateTmp=true
PIDFile=/home/accesspoc/accesspoc-gunicorn.pid
EnvironmentFile=/home/accesspoc/accesspoc-env
WorkingDirectory=/home/accesspoc/dip-access-interface/accesspoc
ExecStart=/home/accesspoc/dip-access-interface/venv/bin/gunicorn \
--access-logfile /dev/null \
--worker-class gevent \
--workers 4 \
--bind unix:/home/accesspoc/accesspoc-gunicorn.sock \
accesspoc.wsgi:application
ExecReload=/bin/kill -s HUP $MAINPID
ExecStop=/bin/kill -s TERM $MAINPID
[Install]
WantedBy=multi-user.target
Start and enable the service:
systemctl start accesspoc-gunicorn
systemctl enable accesspoc-gunicorn
To access the service logs, use:
journalctl -u accesspoc-gunicorn
The Gunicorn service is using an Unix socket to listen for connections so we will use Nginx to proxy the application:
apt-get install nginx
nano /etc/nginx/sites-available/accesspoc
With a basic configuration:
upstream accesspoc {
server unix:/home/accesspoc/accesspoc-gunicorn.sock;
}
server {
listen 80;
server_name example.com;
client_max_body_size 500M;
location / {
proxy_set_header Host $http_host;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_redirect off;
proxy_buffering off;
proxy_pass http://accesspoc;
}
}
Link the site configuration to sites-enabled
and remove the default configuration:
ln -s /etc/nginx/sites-available/accesspoc /etc/nginx/sites-enabled
rm /etc/nginx/sites-available/default
Verify configuration and restart Nginx service:
nginx -t
systemctl restart nginx
Requires Docker CE and Docker Compose.
Clone the repository and go to its directory:
git clone https://github.com/CCA-Public/dip-access-interface
cd dip-access-interface
Build images, initialize services, etc.:
docker-compose up -d
Initialize database:
docker-compose exec accesspoc ./manage.py migrate
Create search indexes:
docker-compose exec accesspoc ./manage.py index_data
Add a superuser:
docker-compose exec accesspoc ./manage.py createsuperuser
Follow the instructions to create a user with full admin rights.
To maintain the Docker image as small as possible, the build dependencies needed are removed after installing the requirements. Therefore, executing tox
inside the container will fail installing those requirements. If you don't have Tox installed in the host and need to run the application tests and syntax checks, use one of the following commands to create a one go container to do so:
docker run --rm -t -v `pwd`:/src -w /src python:3.6 /bin/bash -c "pip install tox && tox"
docker run --rm -t -v `pwd`:/app omercnet/tox
Access the logs:
docker-compose logs -f accesspoc elasticsearch
To access the application with the default options visit http://localhost:43430 in the browser.