Ichabod - Hydra at NYU

A prototyping effort to bring Hydra to NYU as a metadata augmentation system.

What is Hydra?

The Hydra group describes Hydra as a "repository solution." True to this description it does strive to solve some issues presented by repository management and discovery. The repository in discussion is Fedora, which NYU doesn't use currently but can be brought up to function as a repository for augmented metadata (where the content itself is housed in R-star or the Institutional Repository) and the Hydra stack can provide a customizable interface for back and front end.

Hydra is really just a collection of interweaving (open-source) technologies that provide a simple way to manage a Fedora repository: the hydra gem, Fedora, Solr, Blacklight. These technologies all revolve around the Ruby on Rails framework so we can easily integrate NYU SSO and shared assets.

Moving parts

The hydra gem offers a Rails interface for creating, editing and deleting objects in Fedora through ActiveFedora, which creates Ruby models as proxies for the Fedora-based ones. It also provides a DSL for interacting with access controls in Fedora. This is just a Hydra implementation of the cancan authorization gem.
Fedora is the repository itself: where the metadata and even content can be stored (i.e. book cover image, pdf of actual book page).
Solr is fast Lucene-based open-source software. I believe we are all familiar with Solr. ActiveFedora uses the solrizer gem to push Fedora objects into a solr index in real time.
Blacklight is a Rails engine gem that acts as a front-end discovery interface for Solr indexes.

Hydra models

In Fedora an object can have many 'datastreams' which are either content for the object or metadata about the object. We create these datastreams as OM (Opinionated Metadata: describes format of an xml doc in Ruby) terminologies to convert our Ruby objects into XML that ActiveFedora can push to Solr. OM lets us describe what our Solr index will look like.

For example, a model aping a Fedora-based Book object with title and multiple authors attributes would look like:

class Book < ActiveFedora::Base
  include Hydra::AccessControls::Permissions

  has_metadata 'descMetadata', type: BookMetadata

  has_many :pages, :property => :is_part_of

  has_attributes :title, datastream: 'descMetadata', multiple: false
  has_attributes :author, datastream: 'descMetadata', multiple: true

end

With a matching OM description:

class BookMetadata < ActiveFedora::OmDatastream

  set_terminology do |t|
    t.root(path: "fields")
    t.title index_as: :stored_searchable
    t.author index_as: :stored_searchable
  end

  def self.xml_template
    Nokogiri::XML.parse("<fields/>")
  end

end

Gated access

Hydra fully supports gated access. In fact it's an integral part of Hydra's architecture.

Access privileges

By default there are three levels of access that can be granted: Discover, Read, Edit
Custom privileges can be created as well. With CanCan you can!
Privileges can be defined at the user or group level, there is also the possibilty of having a configurable role mapper for mass assigning of assigning based on an external system (i.e. Aleph).

Granting access with the Rails console

To grant access on an object to a user through the console:

b = Book.all.last
b.discover_users = ["user123"]
b.save

Now user123 has discover access to this book and can search for it but will not be allowed to click through to details or edit the object unless I grant read and edit privileges.

To grant access for all registered (i.e. signed-in) users to all Book objects we can do something like:

Book.all.each {|book| book.read_groups = ["registered"]; book.save }

Because these permissions just appear as relationships between objects, checkboxes can easily be integrated into views to allow admin users to grant their own levels of access to objects they manage. Thanks Rails!

Oh and did I mention these permissions are indexed into Solr as well. They also can be pulled directly out of Fedora if they are already defined in there so no double work has to be done.

Starting from scratch

ActiveFedora::Base.reindex_everything

Big wins that come with implementing Hydra out of the box

Defining additional metadata
Defining access controls per object or class of objects
Defining relationships between objects (the Fedora RDF is translated to active record like database relations)

Sample Data Ingest

Rake tasks available to ingest data from "ingest" directory.

rake ichabod:load["./ingest/sdr.xml","sdr"]
rake ichabod:load["./ingest/stern.xml","fda"]

... and to purge data based on same data files.

rake ichabod:delete["./ingest/sdr.xml","sdr"]
rake ichabod:delete["./ingest/stern.xlm","fda"]

Resources

Homepage: http://projecthydra.org/
Dive Into Hydra Tutorial: https://github.com/projecthydra/hydra/wiki/Dive-into-Hydra
Access Controler with Hydra Tutorial: https://github.com/projecthydra/hydra-head/wiki/Access-Controls-with-Hydra
Tame your XML with OM Turotial: https://github.com/projecthydra/om/wiki/Tame-your-XML-with-OM
Project Hydra on GitHub: https://github.com/projecthydra/hydra
GitHub Wiki: https://github.com/projecthydra/hydra/wiki/
Duraspace Wiki: https://wiki.duraspace.org/display/hydra/The+Hydra+Project
IRC Channel: #projecthydra

chrpr/ichabod