/dpla-service-hub

Pilot DPLA Service Hub for Colorado and Wyoming

Primary LanguagePythonApache License 2.0Apache-2.0

DP.LA Service Hub

This project provides a lightweight DP.LA aggregation feed and a command-line interface for ingesting different bibliographic and metadata vocabularies like MODS, Dublin Core, and MARC into a [RDF triplestore][BL] as BIBFRAME 2.0 linked-data. This project is based KnowledgeLinks.io's Catalog Pull Platform using the RDF Framework and BIBCAT.

This project started as a pilot for the Colorado/Wyoming DP.LA service hub.

Setup

  1. Clone or fork the project repository:

    git clone https://github.com/KnowledgeLinks/dpla-service-hub.git
    
  2. Initialize and update submodules

    cd dpla-service-hub/
    git submodule init
    git submodule update
    
  3. Create an instance directory for configuration and custom RDF rules:

    mkdir instance
    cd instance/
    touch config.py
    

Config.py options

To configure dpla-service-hub, you'll need to add these minimum variables in your config.py file.

  • SECRET_KEY - Random string of characters for seeding Flask
  • BASE_URL - Base URL to use for IRI minting, defaults to http://bibcat.org/

Ingestion

Right now, the way to ingest records into the triplestore is open an interactive Python 3 session. Here is an example of setting-up your Python environment to use the these different types of source ingesters into the triplestore:

import sys
sys.path.append("/dpla-service-hub/bibcat")
from ingesters.ingester import NS_MGR, new_graph

Customizing

To customize the field mappings, add common properties, and other information to the triplestore, add Turtle RDF files in the custom directory. When you then create an ingester, include the title of the turtle file with the custom parameter to use your custom rules during the ingestion period.

MARC 21

Create a MARC21 ingester using a custom RDF rules graph for Colorado College along with a sample of Colorado College's MARC 21 records:

import pymarc
import ingesters.marc as marc2bf
marc_ingester = marc2bf.MARCIngester(rules_ttl=['cc-marc-bf-.ttl'])
with open("dpla-service-hub/tmp/cc-marc.mrc", "rb") as fo:
    reader = pymarc.MARCReader(fo, to_unicode=True)
for record in reader:
    marc_ingester.transform(record=record)

MODS XML

import requests
import xml.etree.ElementTree as etree
import ingesters.mods as mods
mods_ingester = mods.MODSIngester(xml=mods_xml, rules_ttl=["cc-mods-bf.ttl"])

Request the MODS XML datafile from a Colorado College's Islandora repository for a single Fedora Object:

mods_result = request.get("https://digitalcc.coloradocollege.edu/islandora/object/coccc:26262/datastream/MODS/view")
mods_xml = etree.XML(mods_result.text)
mods_ingester.transform(source=mods_xml)

Dublin Core XML

To test a random collection of Dublin Core RDF XML from Denver Public Library

import pickle
import pymarc
import ingesters.dc as dc
dc_ingester = dc.DCIngester(rules_ttlt st=['dpl-dc.ttl'])
with open("dpla-service-hub/tmp/sample_recs.pickle", "rb") as fo:
    sample_recs = pickle.load(fo)
for rdf_record in sample_recs:
    dc_ingester.transform(xml=etree.tostring(rdf_record))
    dc_ingester.add_to_triplestore()

Dublin Core CSV

Deploying with Docker and Docker-Compose

This project now supports Docker and Docker Compose. To run the DPLA Service Hub stack, run docker-compose up from the base directory. It will build a bibcat image using the instance/config.py file you created

Server Aggregation Feed