gbv/jskos-data

RVK: Extract Registereinträge

stefandesu opened this issue · 0 comments

We need to be able to extract the RVK Registereinträge from the official dump. These are not converted by mc2skos and it's probably not worth it to adjust it so that they are included. Instead, we could write an additional script that extracts them and optionally 1) adds them to the RVK JSKOS data (we have decided to use the field subjects for that) or 2) export them as a JSKOS concordance (see "GND-Indexterme der RVK" in Cocoda; it just needs to updated). 1) would be nice for indexing/search as well as showing the data in Cocoda.

The Registereinträge are in the field 750. Subfield 0 contains the ID (although it needs to be cleaned, see below), subfield a contains the label (which should be saved in prefLabel.de), and subfield 2 contains the string "gnd" (I'm not sure if there are any others, so entries with other values should be filtered out).

This is an example entry for BV 9150:

<datafield tag="750" ind1="1" ind2="7">
  <subfield code="0">(DE-588)4122202-7</subfield>
  <subfield code="a">Homiletik</subfield>
  <subfield code="2">gnd</subfield>
</datafield>

The resulting entry in the subjects field should be:

{
  "uri": "https://d-nb.info/gnd/4122202-7",
  "inScheme": [{ "uri": "http://bartoc.org/en/node/430" }],
  "prefLabel": { "de": "Homiletik" },
  "type": ["http://www.w3.org/2004/02/skos/core#Concept"]
}

Not sure if type is necessary or can be inferred.

The resulting mapping for the concordance should be:

{
  "from": { "memberSet": [{ "uri": "http://rvk.uni-regensburg.de/nt/BV%209150" }] },
  "to": { "memberSet": [{ "uri": "https://d-nb.info/gnd/4122202-7" }] },
  "fromScheme": { "uri": "http://bartoc.org/en/node/533" },
  "toScheme": { "uri": "http://bartoc.org/en/node/430" },
  "creator": [{ "prefLabel": { "de": "UB Regensburg" } }],
  "type": [ "http://www.w3.org/2004/02/skos/core#mappingRelation" ],
  "partOf": [{ "uri": "https://coli-conc.gbv.de/api/concordances/rvk_gnd_ubregensburg" }]
}

(Maybe it would be good to include notation for the concept data as well because we include it when creating mappings in Cocoda.)