inveniosoftware/invenio

Vocabularies Sprint Goal (Dec 2020)

lnielsen opened this issue · 0 comments

The goal of the sprint is to add vocabularies support in InvenioRDM as a small layer on top of a reusable vocabularies support in Invenio.

Subgoals

Add the following vocabularies to InvenioRDM:

  • License (SPDX vocabulary)
  • Subjects (mixed free text + others)
  • Affiliations (ROR.org source)
  • Languages (ISO-639-3 source)

For each vocabulary that includes:

  • UI: The user interface widget and it's integration in the RDM deposit form.
  • API: The REST API for the vocabulary itself, as well as it's integration in the InvenioRDM record.
  • Display: The proper display of localized values from the vocabulary in search results and landing pages.
  • Import: The import from the vocabulary data source.

How does a successful outcome look like?

End-user

For an end-user, a successful sprint outcome is visible in the deposit form and search results/landing pages (e.g aggregations, labels etc).

Deposit form

Affiliation:

  • As an uploader, I want organisations names auto-completed in the affiliation field, so that I can save time.
  • As an uploader, I want to be able to specify organisations names not yet in the database, so that I'm not constrained on the input.

License:

  • As an uploader, I want help to select the license for my upload, so that I know what it means.
  • As an uploader, I want to be presented with the best license choice for my upload, so that I don't have to think.
  • As an uploader, I want to be able to specify a custom license text not known to the system, so that I can provide the correct license.

Subjects:

  • As an uploader, I want subjects to be auto-completed, so that I can use similar subjects as other
  • As an uploader, I want to limit auto-completion to specific vocabularies, so that I don't get irrelevant suggestions.
  • As an uploader, I want to provide free text keywords, so that I can include keywords exactly as they are on my journal article.

Languages:

  • As an uploader, I want to quickly specify multiple languages for my upload, so that I save time.
  • As an uploader, I want to see both the full name and the language code.

Mockup

Deposit - Protection v2

Search results/landing page

In search results, vocabularies are often used in facets or for showing classification labels and similar (see below):

  • As a visitor, I want to see human-readable titles in facets in my own language, so that the site looks professional and is understandable.
  • As a visitor, I want to see human-readable subjects on both landing pages and search result items, , so that the site looks professional and is understandable.

Screenshot 2020-11-27 at 14 13 07

Plan

The work is divided into two parallel tracks:

  • Front-end track: Building the necessary UI widgets.
  • Backend track: Building the REST APIs to support the UI widgets and rendering of search results.

Frontend track

  • Analyse, mockup and plan generic reusable widgets and it's integration with formik in the deposit form.
    • The UI widget itself.
    • The state management and data management.
    • Affiliation widget: auto-complete a single value, but allow non-vocabulary items as well.
    • License widget: auto-complete a single value using a modal select box (to be designed)
    • Subjects: auto-complete multiple values, with a possibility to limit to a specific subject scheme, and a possibility to specify subject terms not in the vocabulary.
    • Languages: auto-complete multiple values, restricted to only allowing vocabulary items.
  • Build the widgets with mock data.
  • Integrate them in the deposit form state and
  • Integrate the widgets with the REST API.

If blocked by others:

  • Improve deposit form UX.

Backend track

  • Design: See RFC 40 and 41.

  • Parallel track 1: Build the first REST API vocabularies endpoint into InvenioRDM (/api/vocabularies/languages).

    • Build basic record type factory into Invenio-Records-Resources
    • Build Invenio-Vocabularies module with a generic vocabulary (data, service, presentation layer) using common definitions.
    • Integrate the Invenio-Vocabularies generic module into InvenioRDM.
    • Build a way to import vocabularies into the generic vocabulary.
    • Import licenses and languages vocabulary into the generic model
    • CHECKPOINT: /api/vocabularies/languages/ and /api/vocabularies/licenses working and delivering data. This allows the frontend track to integrate it into the widgets.
  • Parallel track 2: Build the machinery for linking records

    • Build basic relations support into Invenio-Records (integrity checking, dereferencing, indexing)
      • Note: Invenio v3.4 sprint team depends on getting a final Invenio-Records release as early as possible.
    • Build basic scan() and reindex() support into Invenio-Records-Resources
    • CHECKPOINT: ability of data layer to check integrity, dereference a record, and index (allows integrating vocabulary into bibliographic record).
  • Integrate languages vocabulary into the RDM bibliographic record in Invenio-RDM-Records

    • Data layer: integrity checking, dereferencing and indexing
    • Service layer: a marshmallow schema
    • Presentation layer: show in
    • CHECKPOINT: REST API is now fully supporting the linking of languages
  • Build a facet over languages

    • Build programmatic vocabulary API
    • CHECKPOINT: Search results facet over languages is working.

CHECKPOINT: One full vocabulary working all the way through the backend.

  • Build licenses vocabulary
  • Build subjects vocabulary
  • Build organisation vocabulary
  • Improve stack and APIs
    • Address performance challenges
    • Address machinery challenges (data flow, updates, etc)
  • Validate vocabularies on Invenio-App-ILS
  • Test migration of resource type vocabulary

Training/Context

  • Ability to install and run InvenioRDM, with assets and module development workflows.
  • Training on Invenio-(Drafts|Records)-Resources data flow.