Derive shapes from maps
Opened this issue · 4 comments
I would like to propose a new feature where minimal SHACL shapes are generated from the mappings. The purpose is to generate a starting point for defining more specific constraints over the output data. For example, given the mapping shown in the language reference
map AirportMapping from airport {
subject template "http://airport.example.com/{0}" with id;
graphs
template "http://airport.example.com/graph/stop/{0}" with id;
constant "http://www.w3.org/ns/r2rml#defaultGraph";
types transit.Stop
properties
transit.route from stop with datatype xsd.integer;
wgs84_pos.lat from latitude;
wgs84_pos.long from longitude;
}
One would be able to produce a shape with minimal constraints.
<AirportMappingShape>
a sh:NodeShape ;
sh:targetClass transit:Stop ;
sh:property
<AirportMappingShape/transit:route> ,
<AirportMappingShape/wgs84_pos:lat> ,
<AirportMappingShape/wgs84_pos:long> ;
.
<AirportMappingShape/transit:route>
sh:path transit:route ;
sh:datatype xsd:integer ;
sh:nodeKind sh:Literal ;
.
<AirportMappingShape/wgs84_pos:lat>
sh:path wgs84_pos:lat ;
sh:nodeKind sh:Literal ;
.
<AirportMappingShape/wgs84_pos:long>
sh:path wgs84_pos:long ;
sh:nodeKind sh:Literal ;
.
It's important property shapes are named nodes, so that they would be extendable by adding properties in a separate document and merging them.
Give multiple mappings for same predicate might require sh:or
or different node kind such as sh:NamedNodeOrLiteral
To implement this feature, I would propose to slightly adapt (and also simplify) the feature proposed in #115. I will create a draft PR to illustrate
Shapes derived from the mapping don't necessarily describe the output graph of the pipeline, often there are post-processing steps after the mapping.
Nevertheless, there are likely cases for which shapes derived from the mapping are useful (maybe also for troubleshooting pipelines or the mapping itself by validating intermediate results).
Some things to consider, if shapes are derived from the mapping (in general, not related to the proposal in PR #126 ... more of a "notes-to-self"):
- The mapping might be overspecified and not respresentative of the resulting data graph (eg. using an xpath expression that doesn't match anything)
- A mapping block declaring multiple
types
would result in a shape targeting multiple classes - One graph resource can be populated from multiple mapping blocks. In this case only the sum of the constraints from the resulting multiple NodeShapes would describe the resource (and the derived NodeShapes could not be
sh:closed
individually) - Mapping blocks are aligned to input blocks (eg. a table). One input block can have multiple mapping blocks
- In the mapping block, we don't have an alias for the property, so the property name would have to be used verbatim. This could turn out to become an issue if the generated shapes are extended with statements from a separate document and the schema changes
(Unrelated to this feature request, but related to the last point of the above list) Decoupling the mapping from the schema by means of pointing from the mapping to shape elements, rather than schema elements could be an option to facilitate handling schema changes (shape-first, shape-as-contract).
My plan is to make xrm more hackable, in order to unlock possibilites for toolchain improvements outside of the xrm editor itself. Like #127 and #128
For one-time scaffolding, introspecting the shapes from the output graph of the pipeline might be an alternative.
Here's a query to illustrate this, based on the construct query that SPEX is running in "introspection" mode. I used this in a customer project.
Note: The query has dependencies on spif:
functions which GraphDB has built-in. They need to be replaced for running the query on other stores.
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX mobi: <https://schema.mobicorp.ch/>
PREFIX sh: <http://www.w3.org/ns/shacl#>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX schema: <http://schema.org/>
PREFIX spif: <http://spinrdf.org/spif#>
CONSTRUCT {
?nodeShape a sh:NodeShape .
?nodeShape sh:targetClass ?cls .
?nodeShape sh:property ?propertyShape .
?propertyShape a sh:PropertyShape .
?propertyShape sh:path ?property .
?propertyShape sh:class ?linktype .
?propertyShape sh:datatype ?datatype .
} WHERE {
VALUES ?cls {
# mobi:Table
# mobi:Column
mobi:Mitarbeiter
mobi:Organisationseinheit
}
?subject a ?cls .
?subject ?property ?object .
OPTIONAL {
?object a ?linktype .
}
MINUS {
# --- blacklist ---
VALUES ?cls {
rdf:Property
owl:TransitiveProperty
owl:SymmetricProperty
rdf:List
rdfs:Class
rdfs:Datatype
rdfs:ContainerMembershipProperty
# -------------
mobi:ArchitektursichtElement
mobi:OrganisationsElement
mobi:ProzessElement
mobi:FunktionsElement
mobi:IntegrationsElement
mobi:InformationsElement
# -------------
mobi:Informationsobjekt
mobi:Informationsobjektbeziehung
mobi:Informationsattribut
mobi:Rollenbesetzung
# -------------
mobi:edc\/UiView
mobi:edc\/Link
sh:PropertyShape
skos:ConceptScheme
skos:Concept
}
?subject a ?cls .
}
BIND(DATATYPE(?object) AS ?datatype)
BIND(spif:buildURI("<urn:NodeShape:{?1}>", spif:encodeURL(str(?cls))) AS ?nodeShape)
BIND(spif:buildURI("<urn:PropertyShape:{?1}/{?2}>", spif:encodeURL(str(?cls)), spif:encodeURL(str(?property))) AS ?propertyShape)
}
Shapes derived from the mapping don't necessarily describe the output graph of the pipeline, often there are post-processing steps after the mapping.
Yes, I realised that too while thinking about my proposal. In museumplus it is just like that. The XRM is only temporary representation and has nothing in common with the final representation.
Maybe I did not mention that precisely, but my idea was that shapes defined in XRM could also be unrelated to the mapping itself.
-node-shape PersonNodeShape from PersonMapping {
+node-shape PersonNodeShape {
}
That way one could take advantage of a simpler syntax although that would be slightly incomplete without nice support for vocabularies (re #14).
My plan is to make xrm more hackable
I cannot really comment on that but I'm intrigued about how hackability helps. Let's discuss that
See also https://github.com/RMLio/RML2SHACL
Paper: RML2SHACL: RDF Generation Is Shaping Up
https://lirias.kuleuven.be/retrieve/641696