ucoProject/UCO

UCO should perform OWL 2 DL review with SHACL-SPARQL

ajnelson-nist opened this issue · 3 comments

Background

Many questions have come up over the years of UCO's development related to whether its usage of OWL 2 DL is correct. Development since the UCO and CASE prototypes has been performed without using an OWL review mechanism that could determine elementary issues such as whether the Turtle syntax was correct (resolved once a syntax normalizer was adopted for Continuous Integration), through advanced issues such as whether the ontology defines unsatisfiable classes (i.e. classes that constrain themselves, intentionally or otherwise, to always be empty).

Some engines have been tried to determine UCO's OWL 2 DL conformance, but have frequently met issues with being incompatible in some way or another with UCO's usage of SHACL. Most pointedly, SHACL is not defined as an OWL ontology in any way that exercises OWL classes or properties, which causes OWL validation of a SHACL ontology to halt due to considering sh:-prefixed concepts to be incompletely defined.

UCO needs some constraints from OWL 2 DL, such as ensuring disjointedness of object-properties from datatype-properties. There are also significant node constraints within OWL 2 DL that have proven difficult to determine even after several read-throughs of specifications, and misunderstanding those constraints could accidentally move a UCO graph from OWL 2 DL into OWL FULL where behaviors are undefined. To wit, prior considerations for ontology versioning and for reification of triples both encountered significant strategic revisions after finding part of the intended strategy was disallowed in OWL 2 DL.

SHACL provides SPARQL-based mechanisms (in SHACL-SPARQL; see examples) to identify triple combinations that should not appear in an ontology-graph or data-graph. UCO should make best-effort usage of SPARQL-based constraints to validate its OWL usage.

Requirements

Requirement 1

UCO must be able to validate its conformance against OWL 2 DL, in at least partial degree.

Requirement 2

Extensions to UCO (such as ontology revisions under draft outside of this Git repository, and private extensions) must be able to use UCO's OWL 2 DL conformance-review mechanism.

Requirement 3

The transitive closure of UCO's imports must be testable with at least the same OWL 2 DL stringency as is applied to UCO.

Risk / Benefit analysis

Benefits

  • Definition of OWL 2 DL conformance in SHACL shapes adds a review mechanism that is compatible with UCO's usage of both OWL and SHACL.
  • Recent proposals (such as CASE's AnalyticInference proposal, and a paused proposal reviewing UCO's syntax of enumerant-based datatypes) have been significantly slowed from early attempts exercising OWL mechanisms. Having mechanically-reviewed rules will reduce confusion with design and implementation of new proposals.
  • Review with OWL-focused SHACL shapes will help UCO measure risk of new adoptions of ontologies.

Risks

  • The goal of this proposal is NOT to implement all of OWL 2 DL in SHACL. Full OWL 2 DL review needs to handle operations like expansion of abstract class definitions, identification of constraints that reduce to empty sets, and inconsistency declarations like recognizing when an empty set is also asserted to have a member. It's not clear if this is possible with SHACL and SPARQL.
  • It is possible an effort to validate OWL 2 DL with SHACL (to the maximal extent possible) exists. The proposer has not been able to locate such an effort.
  • Most of the OWL-focused constraints seem to require SHACL-SPARQL to implement. While these may seem expensive for review, they will only infrequently (if ever) run in the "ABox" graphs of users' knowledge bases - that is, portions definining concrete individuals, rather than classes and properties. So, their estimated impact on SHACL validation is only expected to be felt in unit testing for "TBox" (class/property/datatype) focused graphs like what is in the UCO Git repository. (The JSON-LD samples under tests/examples/ in this repository are examples of "ABox" graphs.)
  • UCO CP-100 took a shortcut with rdf:List for the purpose of easing maintainability of OWL enumerant-based datatypes and UCO's semi-open vocabularies needing to be able to reference member lists in SHACL shapes. This shortcut was called out as a known act of delaying a OWL 2 DL conformant implementation. For better or worse, the SHACL shapes accompanying this proposal flag that as an error, inducing the need to undo that shortcut. This causes two risks:
    • Test timing - This will at least double parallel-testing time (i.e. make -j), and triple non-parallel testing time (make without -j, as the CI runs it), because rdf-toolkit takes an extensive amount of time to sort long rdf:Lists, especially those in the vocabulary namespace, and they will now be duplicated in the observable namespace. This does not currently cause a risk of timeouts on Github Actions, as the default timeout is currently 6 hours.
    • List consistency - So long as UCO uses this current semi-open vocabulary design, an additional list-review mechanism needs to be deployed to ensure vocabulary members copied into SHACL match with members as they're recorded in rdfs:Datatypes.

Competencies demonstrated

Competency 1

As part of CI testing, UCO can now review its conformance with OWL 2 DL.

Competency Question 1.1

What does UCO define as best-effort conformant with OWL 2 DL?

Result 1.1

A review of the uco-owl namespace shows shapes that quote and link the OWL 2 specification.

Competency Question 1.2

How does UCO test that its (TBox) ontology is conformant with OWL 2 DL?

Result 1.2

Within the CI, a monolithic build of UCO is constructed, combining all of the Turtle files under the ontology/ directory. Before that file is syntax-normalized, pyshacl is used to review the combined file with the uco-owl namespace's shapes. See tests/Makefile, target uco_monolithic.ttl.

Competency Question 1.3

What other ontologies can be reviewed with the uco-owl namespace?

Result 1.3

The uco-owl namespace tests conformance versus OWL 2 DL, as well as some implications for SHACL shapes, such as confirming that DatatypePropertys used in PropertyShapes aren't constrained to match non-Literals. This can apply to ontologies that are more focused on TBoxes (classes/properties/datatypes), or broader ABox knowledge-bases such as tool output mapped into UCO.

The support done for the broader ABox-oriented knowledge bases is currently review of ontology imports' transitive closure, and owl:Axioms for assertion-annotations. See especially the shapes pertaining to owl:Axiom, owl:ontologyIRI, and owl:versionIRI.

Solution suggestion

  • Add UCO-OWL namespace, IRI https://ontology.unifiedcyberontology.org/owl, prefix uco-owl:.
  • Define SHACL shapes that include citations (using the generic rdfs:seeAlso) to OWL 2 documentation.
    • Use default sh:severity (sh:Violation) for "MUST NOT" pattern matches.
    • Use sh:severity sh:Warning for "SHOULD NOT " pattern matches.
  • Revert assignment of IRIs for rdf:Lists done to ease semi-open vocabulary synchronization.
  • Add unit test for semi-open vocabulary synchronization.
  • Add PASS and XFAIL JSON-LD samples for each uco-owl: shape.

Coordination

  • Tracking in Jira ticket OC-157
  • Administrative review completed, proposal announced to Ontology Committees (OCs) on 2022-06-29
  • Requirements to be discussed in OC meeting, 2022-07-12
  • Requirements Review vote occurred, passing, on 2022-07-12
  • Requirements development phase completed.
  • Solution announced to OCs on 2022-07-22
  • Solutions Approval to be discussed in OC meeting, 2022-07-28.
  • Solutions Approval vote occurred, passing, on 2022-08-09
  • Solutions development phase completed.
  • Implementation merged into develop
  • Milestone linked
  • Documentation logged in pending release page

PR 407 is posted to help with review, but it will be replaced with another patch series after tomorrow's meeting.

An objection to this proposal's Solution was made in a committee meeting, and has been logged here.

For the OCs' awareness - there is another effect of adding OWL review using pyshacl. It is noted in this comment.