Islandora/documentation

Use Case: DOI minting

bryjbrown opened this issue ยท 22 comments

Title (Goal) Mint DOIs for new content
Primary Actor user
Scope Drupal
Level Medium
Story I am a user creating a new repository object, I want to be able to mint a new DOI automatically from the submission form.

Implementation ideas: perhaps have a DOI field that, if empty, triggers a checkbox that says "Mint DOI?" and, if checked, triggers an action at the Drupal level that farms out the DOI minting request to whatever vendor backend you have configured, and sticks the newly minted DOI in the DOI field. This should be seamless from the user's perspective, but completely configurable from the repository administrator's perspective.

@mjordan's 7.x islandora_doi_framework module provided an abstract DOI handling API and put the DataCite API specifics into a submodule so that the main module could be extended to work with other vendor backends through submodules as well (like CrossRef). This should be like that. Drupal 8's search_api and search_api_solr are another example of this pattern in practice.

My specific use case is for DOIs, but if we are trying to make this an abstract API, would it make sense to start at the level of generic identifiers that need to be minted from a third party? This would allow for the use of other DOI-like identifiers (like ARKs and Handles) too.

@bryjbrown absolutely make this generic for identifiers. Making an easy way to mint ARKs for repository objects was already on our local feature list for future development. I was also going to have it update the object's alias.

We "mediate" minting of DOIs, where an admin does it after submission as part of a workflow, so as a variation of this use case I'd like to suggest:

Title (Goal) Mint DOIs for new content
Primary Actor admin/privileged user
Scope Drupal
Level Medium
Story I am a sufficiently privileged user, and as part of my workflow I need to generate a DOI for an already-submitted article, independent of the user who submitted the object.

For the record, here's the repo for the 7.x module: https://github.com/SFULibrary/islandora_doi_framework.

Should DOIs/ARKs be indexed in triple store instead of Drupal urls?

Would we use it in establishing internal relationships (ex for taxonomies)?

@Natkeeran good question, and it would make sense to do that, but if we use a DOI, etc. all calls to it would be dereferenced back to Drupal with a 302 (or some other type of redirect). If the DOI's target URL ever got stale (this happens), the redirect would likely result in a 404. So I'd vote for no, not replacing Drupal URLs in the triplestore with DOIs/ARKS or using them for internal relationships.

So I'm trying to flex my nascent Drupal 8 muscles and figure out the best way this could be implemented. Here are my thoughts so far:

  1. The preferred nomenclature for things like DOIs, ARKs & Handles seems to be PIDs (persistent identifiers). This term has some baggage in the Islandora community, but considering the general nature of the term I don't think it should stand in the way of us using it if it is indeed the right word to use to refer to all of these mintable external PURL-esque IDs.

  2. Perhaps the best way to do what we want in D8 is to provide a PID handling service. The service should be able to take a node's metadata and mint a PID for it, or take a PID and return metadata about it. We could have a generic abstract PID service that provides a standard interface, and then make instantiatable subclasses (subservices?) of that PID service for specific types of PIDs like DOIs, ARKs, and Handles.

  3. This module would also provide a Plugin Type that describes a plugins that can be written to connect these specific PID subservices with specific vendor backends (eg, a DataCite plugin that powers the DOI service, or a CrossRef plugin that does the same thing). You enable a submodule that defines the plugin for the vendor you want to use and configure it through an admin interface for the plugin, and then you select it as the provider for the subservice through the subservice's admin interface.

  4. Once the services are done, we provide all sorts of handy dandy Drupal candy that can fit into folks' workflows like actions that can be triggered or whatever else people find useful.

As anyone who has any experience with OOP or D8 internals can tell, I am absolutely grasping at straws here and struggling to find the right words (and architectural components) to express the main idea, which is a completely decoupled approach that uses core parts of Drupal to make something very extensible. If anything I'm proposing seems like obviously the wrong way to go, please leave some feedback here and point me in the right direction.

@bryjbrown ++ to all of your ideas here from me. I love the idea of being able to use actions+context for minting PIDs. I'll let more experienced Drupal 8 devs weigh in on the details, but in general, ๐Ÿ‘

@bryjbrown

Not sure if i understand this fully, sorry, so i will try to make some questions/statements here:

For DOI, ARKs and Handles: the minting of such a Persistent URI is done on a Handle Server (Handle.net) and normally you have to pay for the service and run the service locally (its a java servlet). Resolution for those Handles is also done externally, by that service that redirects to your domain and then to a Given local URI (lets say a URI like yourdomain.edu/node/1). As with any alternative ID (being the real ID the node's URL, you just have to keep that "minted" somewhere in your metadata (choose a predicate for it and probably a different one for each type of Persistent URI). But that is just informative. The handle server is really the one doing the heavy work and no redirection, resolving, etc happens on Drupal 8 at all. Is that right? Are there any other uses cases?

About the service you say, its a pretty slim one (for handles, probably DOIs require sending more metadata? Have to look at the API), basically a call to a Handle Server's (local) API. The DGI 7.x Module does that pretty well and on D8 it is not more than a guzzle request and a response you put back somewhere in the node's metadata. The plugin system you mention comes handy and to be honest you don't even need to write something new. There are quite extensive already written code that deals with external REST type of calls in D8 and reusing the results and yes, plugin manager and custom plugins are the better solution (well, 90% is a plugin in Drupal). That kinda answers 3. and 2. ? It kinda resumes to building a custom plugin and maybe a D8 service + plugin manager that covers most of the data/use cases you need.

Some of the things you may need to look for.

First: State/Workflow concerns. You need to trigger this on node save (not presave) since the node id and the final URL are set there. You can not request a Stable / Persistent URI for a loca URI that does not exist yet or that is not accesible (like in a moderation/Draft state)?

Second: Tests and Document. Provide people with a way of testing this without subscribing to a handle.net, etc, commercial agreement. I find that piece the most challenging, if not, it becomes just a few privileged users (Universities) use case.

FYI https://www.drupal.org/project/orcid exists, someone should give it a test, linking to #859.

I think @ajstanley and @rosiel have evaluated that module @mjordan. And also potentially another one focused on adding information from external calls: https://www.drupal.org/project/external_data_source

Net, that external data sources module looks pretty cool.

Addendum: from what I can tell from the description, this module doesn't persist data to the Drupa node, it just fetches it at view time and renders it in a field. I'll have to look further to confirm that though.

https://www.drupal.org/project/external_data_source lets you define a field and populate its select list, etc. values from an external API (just like the project pages illustrates). Only comes with one plugin, which fetches a list of countries from https://restcountries.eu/. Could be useful (e.g., if a campus has an API listing current courses, for example) but it's not applicable to this use case as far as I can tell.

Should we be including globally resolvable identifiers like DOIs in the Links produced as part of signposting?

@mjordan I think the answer to this is "yes" based on what I interpret http://signposting.org/identifier/ to be saying, although I readily admit I may be misunderstanding it. They seem to imply that all of the different representations of a resource (HTML, PDF, JSON, etc) as well as the resource's "landing page" that links to these representations, all of them should be using the same stable permanent URI as the Link: <http://your.stable.url/goes/here> ; rel="cite-as" HTTP header. The way they organize the examples, it looks like using a DOI or some other unique external identifier is preferable to using the system's URI because it helps tell the crawler that this is the same resource that might have been described on other systems. It should also be fine to use the system's URI though so long as it won't change.

Yeah, the examples are not very clear. I find it strange that none of the examples start with "Repository Stable HTTP URI" -> and ends up with "-> DOI." But, the first example under "DOI/Handle โžž Landing Page โžž Publication Resources" does show how you would include the DOI in the Link headers: Link: <https://doi.org/10.1017/jns.2015.28>; rel="cite-as".

Does it makes sense to always include the DOI Link header if it's present? If so, how do we tell if it's present (that is how do we distinguish it from other identifers)? What about ARKs, PURLs, and other types of globally addressable identifiers?

Offering an answer to my own question, the typed relation Identifier field can take a type of DOI, so it might make sense to include Link headers from the Identifier field of specific types, e.g. DOI, ARK, etc.

This was mentioned at the Fedora Tech call around some work done at the University of Houston.
https://uh-ir.tdl.org/handle/10657/1903

I am interested in connecting islandora to a handle service and am planning on digging into this if no one else has already tackled it. Is it correct that no one has built a module to mint handles for islandora nodes/media?

@elizoller not that I'm aware of. I'm planning on implementing some ARK minting capability using the EZID API, but it is further down my list.

An update to say that the Robertson Library's RDM project has a functioning DOI minting submodule, islandora_rdm_datacite_doi. Unfortunately it's not a generalized system, but it works for datasets with Datacite because that was within scope for our project. It requires our islandora_rdm_datacite module, which creates the API for datacite, and requires the use of our "dataset" content type, and its field structure which was designed to populate a valid Datacite XML document. It does not mint on object creation, but using Content Moderation/Workflow, it mints a doi when the dataset enters a state of Published (and, I believe, only if the user-editable DOI field was not populated already).

amyrb commented

The I8 Delta Doc requirements include a number of requirements related to DOI and other persistent identifiers, which I wanted to list here for reference:

  • Support multiple persistent identifier types, such as DOIs, PURLs, EZIDs, ARKs and HDLs, and be extensible to allow for new persistent identifier types to be added
  • Support for a range of custom triggers for minting persistent identifiers, for example through Rules or in a webform
  • Support for a range of patterns for customizing the IDs minted by the system, for example DOI prefixes and custom suffixes of variable length
  • Support for a multitude of backend registrants that take the newly minted DOI and register it with a vendor like DataCite or CrossRef

https://github.com/discoverygarden/dgi_actions was written to be extensible / generic enough to integrate with more services as required. It allows you to choose what entity/bundle/field that the resulting metadata goes into and so on. Currently there are Handle, EZID and ARK modules.