/dcat-ap-ch-shacl

SHACL shapes for the ECH-0200 standard

OtherNOASSERTION

Build Status

SHACL Shapes for the DCAT Application Profile for Data Portals in Switzerland

This project provides SHACL shapes to validate metadata against the eCH-0200 standard.

Files

  • ech-0200.shacl.ttl : This file models the constraints defined in eCH-200 with the SHACL vocabulary
  • examples/ : This directory contains RDF turtle files that can be used to test ech-0200.shacl.ttl. The convention is that files ending with .valid.ttl will validate, while files ending with .fail.ttl will not validate.

Usage

To use the shapes to validate data, you need a SHACL validator such as TopBraid SHACL API. With this validator you can validate RDF turtle files as follows:

$ shaclvalidate -shapesfile ech-0200.shacl.ttl -datafile data.ttl

For example (from this directory):

$ shaclvalidate -shapesfile ech-0200.shacl.ttl -datafile .\examples\minimal.valid.ttl

Validating files in formats other than Turtle

As the TopBraid SHACL validator only supports the Turtle RDF Format, you need to convert files in other formats such as RDF/XML files.

First you need Apache Jena. You can download and extract it with these commands:

$ wget https://www-eu.apache.org/dist/jena/binaries/apache-jena-3.9.0.tar.gz
$ tar xvzf apache-jena-3.9.0.tar.gz

This creates a folder named apache-jena-3.9.0. To convert your RDF/XML file (eg. file.rdf) you can use this command:

$ ./apache-jena-3.9.0/bin/riot --output=turtle rdfxml file.rdf > file.ttl

And you will find the converted result in file.ttl.

Jena supports these RDF formats: turtle, ntriples, nquads, trig and rdfxml.

References

This project is similar and partially based on the EU DCAT-AP SHACL constraint definitions.

Note on Language

While the eCH-0200 Specification is available in German and French the SHACL shapes are documented in English to better allign with other shape files and tools that are likely used simultaneously.

Comments on the Interpretation of the Specification

  • The specification mandates the use of schema:url as class. This seems to be a mistake, so we assume that schema:URL is what it's supposed to mean.
  • The SHACL file also supports xsd:dateTime where the spec mandates xsd:date.
  • Inference: The specification isn't explicit if and what inference should be allowed. We assume that where vcard:Kind is allowed its subclasses (Individual, Organization, Group, Location) should be allowed to. SHACL only allows specifying ontological statements in the data and not in the shape graph, so currently using a subclass is only accepted if the respective rdfs:subClassOf statement is also present in the data. We could of course explicitly allow some named subclassed in the shape file but this doesn't seem to be wanted by the spec.
  • The type (foaf:Document) does not need to be explicitely specified for a document to validate; the type can be inferred from the rdfs:range of foaf:Document.

Other Points to Discuss

  • Shouldn't we require a dataset to be named (using standard IRI) rather than requiring a proprietary dct:identifier?
  • Also, shouldn't the dct:publisher be named, rather than being an instance of foaf:Agent? Analogous questions can be asked for dcat:themeTaxonomy and foaf:homepage.
  • It seems inconsistent to forbid adms:status on distributions while generally allowing arbitrary properties.

License

As prospective part of an eCH standard the code and documentations in this repository can be used, distributed and further developed without any restriction by patents or licenses.