Mappings to Schema.org - picking up the thread
Opened this issue · 5 comments
In July 2020, the Usage Board made good progress on mappings -- primarily on mappings to Schema.org, but Wikidata and Bibframe also got some attention. Our "summer intern" at the time, Grace Johnson, helped organize this with help from Karen. The main links:
Two Zoom calls in July 2020:
Status of Schema.org mappings as of July 2020:
- https://github.com/dcmi/usage/blob/master/mappings/DC_Mappings_to_Schema.org.md - based on a HackMD document
More candidate mappings:
Agenda as of July 2020
General questions about mapping
- Process (Github issues or Zoom calls?)
- Nature of mappings (close matches, or also broader/narrower?)
- Scope (obviously properties, but also DCMI Type Vocabulary and /terms/ classes?)
- Choice of mapping predicates (rdfs:equivalentProperty, skos:close/exactMatch?)
- Publication of mappings (Web page on DCMI website? in RDF? include in DCMI Metadata Terms?)
Wikidata mappings
- Choice of mapping predicates (P1628 "equivalent property"? P2888 "exact match"?)
- Inject into WD property pages, then extract by SPARQL query? - eg https://bit.ly/dcmimt_on_wikidata
- Static table? - See https://www.wikidata.org/wiki/User:Gjohnson1110
Takeaways and ideas from the July 2020 meetings:
Consensus: do not prioritize Wikidata. Unstable; changes quickly. Scale is daunting. Not used for bibliographic data.
Consensus: prioritize Schema.org. Goal: translate DC data to Schema and know it will make sense; need not necessarily be 1-to-1. Main properties and classes in Schema are relatively stable. Close matches DC-to-Schema have the most use cases: eg "creator" to "creator". Mapping to classes should be a lower priority. Where Schema properties more specialized, map Schema to DC - eg Schema "editor". Note: broader mappings are harder to maintain. Need to avoid expectation that we "got them all" (ie found all relevant mappings).
Dilemma: DC terms most used in data are the 15 unmappably broad ones. Not their subproperties. We may wish that they used the subproperties but this is not the case.
Example: DC contributor broader than Schema contributor. Use skos:broader/narrowerMatch? But in many cases, they may be interchangable. Use skos:closeMatch ("two properties are interchangable in some context") and explain the context? Schema contributor rdfs:subPropertyOf DC contributor? Or can we just say in words without using subproperty? Most users will not know what we mean by subproperty. They may understand "close match". They will understand "use for" (a la ISO 2788).
Example: DC coverage. Legacy definition is so broad. Acknowledge that there is no mapping to just dc:coverage. Recommend picking coverage subproperties: dcterms:temporal and spatial. Then provide matches for those subproperties? Schema:about is also relevant.
Example: DC date. Map to Schema start date and end date? Recommend that people pick one of the date subproperties, then provide matches for those subproperties? We have good equivalents for about half. Note that there is nothing in Schema that dcterms:date can be mapped to without risking garbage data.
Next steps
-
Work on hack.md, which supports comments?
- Leave what's there but select "best one" for publication?
-
Decide whether we make mappings from broader terms that have no equivalents or find some other way.
-
Say: "it could be this it could be that and here's how to decide". A document for humans, not for automatic conversion? Automatic mappings cannot accommodate the conditional choices that a human would have to make.
-
Limit machine-readable mappings to close matches?
-
Hi, Tom. I'm not understanding the section below. Are you referring to
the DC 1.1? In any case, an example or two would be much appreciated,
and any other explanation you can give.
I perhaps phrased the point badly. I meant to say that the DC terms most used in data are the "DC-15", but that we found that some of these were difficult or impossible to map. The most obvious example is DC contributor, where our own usage note says: "Because coverage is so broadly defined, it is preferable to use the more specific subproperties Temporal Coverage and Spatial Coverage". We also struggled with Date.