w3c/data-shapes

use case: local parameterised messages

Opened this issue · 5 comments

As suggested in the issue SEMICeu/DCAT-AP#355 the contribution of a use case.

SHACL specifies that shacl:message overwrites the message generated by the engine.

Thus for the constraint

https://semiceu.github.io//DCAT-AP/releases/3.0.0#AgentShape/236f0210baaf149903750c43bbe7012c21debb2a> 
   rdfs:seeAlso "https://semiceu.github.io//DCAT-AP/releases/3.0.0#Agent.type";
  shacl:description "A type of the agent that makes the Catalogue or Dataset available."@en;
  shacl:maxCount 1;
  shacl:name "type"@en;
  shacl:path dc:type.

and the data

<https://test.com/id/agent/1221> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://xmlns.com/foaf/0.1/Agent> .
<https://test.com/id/agent/1221> <http://xmlns.com/foaf/0.1/name> "agent 1"@en .
<https://test.com/id/agent/1221> <http://purl.org/dc/terms/type> "type ORG A".
<https://test.com/id/agent/1221> <http://purl.org/dc/terms/type> "type ORG B" . 

The SHACL engine will result a precise message that indicates there are 2 values, instead of 1.
" Property may only have 1 value, but found 2" .

If you perform the replacement shacl:message, the message will be: "Maximally 1 values allowed for type".

This happens even if you add a shacl:message in another language that is not provided by the engine. The overwrite will happen.

As the engine message is more detailed and better pinpointing the error, but we also want to support multilingual messages we added an extra property.

You can try it with the ITB testbed which uses the reference SHACL library as it backbone.

Testbed Instance: https://www.itb.ec.europa.eu/shacl/any/upload.

There is nothing in the SHACL spec that prohibits or discourages using multiple languages for sh:message. It is up the implementation to pick (or ignore) such values. In our servlets, we use the accepted language from the HTTP request to pick the most suitable language, but I believe for example if the ontology only declares messages with "nl" as language and "nl" is not among the accepted languages, it may fall back to the default message. The assumption is that the default message in English is better than a custom language in Dutch unless the receiver is actually able to understand Dutch.

(In your examples, note that the preferred namespace prefix for SHACL is "sh" and not "shacl".)

The issue is that the spec does not provide a way how to get from the engine the parameters back to make correct and nice statements in any language, as illustrated.

So in the light of a EU multilingual context providing error messages in the language of the data owner is important. Therefore thus this workaround.

In addition, this also highlights that the error message returned from the engine is often not meaningfull for a data engineer. It needs the context from where this rule came from. Error messages that connect the two are needed in practice.

I see the following options:

  • to allow to manipulate the engine error reports in the shacl language,
  • sh:message in other languages than the engine supported language are added instead replaced (as currently in the spec)
  • sh:message is additive instead of overwriting. In this case english is equally treated as another language.
  • engine message provide a message in a different property than the one set by sh:message. (maybe to be controlled with a specific property).

Our process to record suggestions for SHACL 1.2 is to open issues at https://github.com/w3c/shacl/issues

This repo here is only for editorial changes to SHACL 1.0 and even those probably won't get done as the WG is closed.

@bertvannuffelen Can you please clarify what you want?

  • your third link doesn't use sh:message but some new dataspace <https://purl.eu/ns/shacl#message>
  • your fourth link uses sh:message in Dutch. You could also provide it in English

Reading from the spec:

The only omission I see is that

@VladimirAlexiev sorry for the late reply. Thanks for taking this forward.

@bertvannuffelen Can you please clarify what you want?

* your third link doesn't use `sh:message` but some new dataspace `<https://purl.eu/ns/shacl#message>`

This is intentional: it will show the internal message from the SHACL engine. (same as omitting a sh:message.

* your fourth link uses `sh:message` in Dutch. You could also provide it in English

That is intentional: it will show the message only in dutch and not the internal message from the SHACL engine.

All these cases are to demonstrate the implementation behaviour, and each case,although expressing the same constraint, has a different message to the user. And these are unfortunately not translations from each others, but different values that come close to each other. Mostly loosing valuable information when using the sh:message construct.

Reading from the spec:

* https://www.w3.org/TR/shacl/#x2.-shapes-and-constraints clearly shows `message` applies to both NodeShapes and PropertyShapes

* https://www.w3.org/TR/shacl/#x2.1.5-declaring-messages-for-a-shape says that you can use multiple values in different languages, and all of them are copied to ValidationResult. There's your multilinguality: then the frontend only needs to pick a language appropriate for the user (with fallback)

* https://www.w3.org/TR/shacl/#x2.1.4-declaring-the-severity-of-a-shape gives such an example

correct, my point is not the absence of multilinguality, but the different handling of message in other languages than the engine implementation.
As you point out, the SHACL specification allows for sparql queries that are user defined to have the {?vars} interpolated into the message, but this is not the case for engine messages.
Without that possibility, one has to work around and introduce alternative enrichments to get closer to this objective.

I posted some possible solutions, but the one below corresponds to bullet point 2. If that could be realised, wonderfull.

The only omission I see is that

* https://www.w3.org/TR/shacl/#x5.3.2-mapping-of-solution-bindings-to-result-properties specifies how `{?vars}` are interpolated into the message, but only for SPARQL constraints.

* It would be useful to allow this for all shapes, but what vars can be used?

* I'll post an improvement issue