ucoProject/UCO

Need ability to represent producer-asserted maturity status of any domain object

Bradichus opened this issue · 13 comments

Background

UCO currently lacks any mechanism for a producer to explicitly assert the maturity status of a UcoObject to help manage status of maturation of the object and to provide context to any consumers on how they should view and treat the object. A mechanism is needed to assert such status on any UcoObject to at a minimum convey if the object is "Draft"(the object is asserted by the producer to be in an incomplete or evolving state that should not be treated as operationally valid), "Final"(the object is asserted by the producer to be in a state that is complete and correct enough to be treated as operationally valid), or "Deprecated"(the object is asserted by the producer to have no current or future significance while not invalidating any past significance).

Requirements

Requirement 1

UcoObject MUST be able to express a object producer asserted maturity status of 'draft' for the object.

Requirement 2

UcoObject MUST be able to express a object producer asserted maturity status of 'final' for the object.

Requirement 3

UcoObject MUST be able to express a object producer asserted maturity status of 'deprecated' for the object.

Requirement 4

A single UcoObject MUST be able to have different asserted maturity status at different times.

Requirement 5

UcoObject MUST have at most a single maturity status asserted at any given time.

Risk / Benefit analysis

Benefits

Improved context for consumers

Consumers of UCO will have a clearer understanding of the maturity level of each UcoObject. The maturity status directly communicates whether an object is considered operationally valid. This aids consumers in deciding whether to rely on specific UcoObjects for their cybersecurity operations.

Simplified Object Lifecycle Management

With a maturity status mechanism, producers can explicitly declare the state of their objects, helping them and other stakeholders track the progress and evolution of cyber information. Objects can be marked as "Deprecated" when they are no longer significant, streamlining the process of identifying and handling outdated information without invalidating historical references.

Risks

  • Introducing a maturity status mechanism may add complexity to the UCO framework, leading to increased management overhead.

Competencies demonstrated

Competency 1

Ability to query graph of CDO content to identify "Draft" state object, "Final" state objects and any derivation-based relationships between them.

Competency Question 1.1

What are all objects in this CASE investigation's chain of custody that are designated as "Final," but derived from some "Draft" object?

Result 1.1

Solution suggestion

Implementation of the proposed solution is available in #549

Within the core namespace:

  • Add new vocabulary:ObjectStatusVocab vocabulary
  • Add new core:objectStatus property as xsd:string
  • Add new associated property shapes on core:UcoObject for core:objectStatus with cardinality max=1

Solution discussion

The proposed new core:objectStatus provides a simple mechanism for expressing object maturity (Requirement 1).

core:objectStatus a owl:DatatypeProperty ;
	rdfs:comment "The current state of formality and acceptance for a UCO object."@en-US ;
	rdfs:label "Object Status"@en-US ;
	rdfs:range [
		a rdfs:Datatype ;
		owl:unionOf (
			xsd:string
			vocabulary:ObjectStatusVocab
		) ;
	] ;
.

The proposed new vocabulary:ObjectStatus provides capability to support Requirement 1.

vocabulary:ObjectStatusVocab
	a rdfs:Datatype ;
	rdfs:label "Object Status Vocabulary"@en-US ;
	owl:equivalentClass [
		a rdfs:Datatype ;
		owl:onDatatype xsd:string ;
		owl:oneOf (
			"Draft"^^vocabulary:ObjectStatusVocab
			"Final"^^vocabulary:ObjectStatusVocab
			"Deprecated"^^vocabulary:ObjectStatusVocab
		) ;
	] ;
.

The proposed new associated property shapes on core:UcoObject for core:objectStatus

[
    			sh:datatype vocabulary:ObjectStatusVocab ;
    			sh:message "Value is outside the default vocabulary ObjectStatusVocab." ;
    			sh:path core:objectStatus ;
    			sh:severity sh:Info ;
  		] ,
  		[
    			sh:maxCount "1"^^xsd:integer ;
    			sh:nodeKind sh:Literal ;
    			sh:or (
      				[
        				sh:datatype vocabulary:ObjectStatusVocab ;
      				]
      				[
        				sh:datatype xsd:string ;
      				]
    			) ;
    			sh:path core:objectStatus ;
  		] ,
  		[
    			sh:message "Value is not member of the vocabulary ObjectStatusVocab." ;
    			sh:or (
      				[
        				sh:datatype vocabulary:ObjectStatusVocab ;
        				sh:in (
          					"Draft"^^vocabulary:ObjectStatusVocab
			    			"Final"^^vocabulary:ObjectStatusVocab
			    			"Deprecated"^^vocabulary:ObjectStatusVocab
        				) ;
      				]
      				[
        				sh:datatype xsd:string ;
      				]
    			) ;
    			sh:path core:objectStatus ;
  		]

Coordination

  • Tracking in Jira ticket OCUCO-307
  • Administrative review completed, proposal announced to Ontology Committees (OCs) on 2024-06-04
  • Requirements to be discussed in OC meeting, 2024-05-30
  • Requirements to be discussed in OC meeting, 2024-06-25
  • Requirements Review vote occurred, passing, on 2024-06-25
  • Requirements development phase completed.
  • Solution announced to OCs on 2024-08-13
  • Address the question of "rejected" objects - (Not handling as part of this proposal.)
  • Impact from #593 assessed - (Not incorporating before its Solutions Approval vote)
  • Impact from #629 assessed - (Not incorporating before its Solutions Approval vote)
  • Solutions Approval to be discussed in OC meeting, 2024-08-20
  • Solutions Approval vote occurred, passing, on 2024-08-20
  • Solutions development phase completed.
  • Backwards-compatible implementation merged into develop for the next release
  • develop state with backwards-compatible implementation merged into develop-2.0.0
  • Backwards-incompatible implementation merged into develop-2.0.0 (N/A)
  • Milestone linked
  • Documentation logged in pending release page
  • Prerelease publication: CASE develop branch updated to track UCO's updated develop branch
  • Prerelease publication: CASE develop-2.0.0 branch updated to track UCO's updated develop-2.0.0 branch

@sbarnum or @Bradichus - Can this please be demonstrated with one graph file (/JSON-LD snippet/Turtle snippet) that includes a single object with two "maturities?" I think this is likely to be an occurrence when merging two data sources that can potentially discuss the same individual.

@sbarnum or @Bradichus - I firmly think this proposal needs at least a first-try demonstration. I have denoted that it will not be voted on at this Thursday's meeting.

Analyzing this proposal, we identified an additional aspect that has not yet been identified, as follows.

The purpose for this addition is to allow for a historical trace of data, as it elaborates over de different statuses. This would imply to store copies of the data graph throughout its life-cycle. Those historical data should not interfere with the current "truth" of the latest graph, instead, should be reference-able and/or query-able on demand.

Analyzing this proposal, we identified an additional aspect that has not yet been identified, as follows.

The purpose for this addition is to allow for a historical trace of data, as it elaborates over de different statuses. This would imply to store copies of the data graph throughout its life-cycle. Those historical data should not interfere with the current "truth" of the latest graph, instead, should be reference-able and/or query-able on demand.

There is an important follow-on question on this historical data note, if the committee decides it's worth pursuing: Should the historical graph conform to UCO's SHACL rules? Cardinality constraints are fairly guaranteed to fail when looking the totality of the history, and probably the same would happen with type(class) constraints.

At last week's meeting, we heard that this proposal intends to leave the question of how to store past versions of objects out of scope.

This is my interpretation of the discussion around Requirement 4: that it is a "Reality of operations" reminder. So as policy, UCO consumers would need to be prepared to handle graph objects that change their status on data refreshes.

With our current tested technology stack of RDF 1.1 triples, not quads, it is not possible to have one graph that has an object in both its "draft" status and its "final" status. I think it would be possible to design something for this with quads ("The graph with this IRI holds true until this time..."), but I haven't personally explored this line of data design far.
I don't think anything in UCO prevents users from moving from triples to quads, but we haven't inlined any testing on this yet among the UCO or CASE examples. I've tried before, and found there is a different writing style necessary for SPARQL querying.

For anyone interested in tinkering, it's a 1-liner change to convert a JSON-LD graph from triples to quads. The CASE examples use a top-level @graph key to house the JSON dicts that comprise the graph. If you give that @graph key an @id, you now have a quads graph.

E.g., this graph is a graph with one triple, and can be interpreted as a graph with one quad, where the graph-identifer is anonymous:

{
  "@graph": {
    "@id": "http://example.org/kb/File-1",
    "@type": "https://ontology.unifiedcyberontology.org/uco/observable/File"
  }
}

What follows is an illustration of data conversion and round-tripping, which happens to use rdfpipe1. Other RDF tools should also behave similarly, probably just differing in whitespacing/JSON order decisions.

$ rdfpipe --output-format turtle anonymous_graph.jsonld 

<http://example.org/kb/File-1> a <https://ontology.unifiedcyberontology.org/uco/observable/File> .

$ rdfpipe --output-format nquads anonymous_graph.jsonld 
<http://example.org/kb/File-1> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <https://ontology.unifiedcyberontology.org/uco/observable/File> _:N235f854b8e7641f092aee6a37fc60de7 .

This graph is a graph with one quad:

{
  "@id": "http://example.org/kb/NamedGraph-1",
  "@graph": {
    "@id": "http://example.org/kb/File-1",
    "@type": "https://ontology.unifiedcyberontology.org/uco/observable/File"
  }
}

rdfpipe can convert this to Turtle (discarding the graph identifier) or a nquads-format graph that preserves the identifier:

$ rdfpipe --output-format turtle named_graph.jsonld 

<http://example.org/kb/File-1> a <https://ontology.unifiedcyberontology.org/uco/observable/File> .

$ rdfpipe --output-format nquads named_graph.jsonld 
<http://example.org/kb/File-1> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <https://ontology.unifiedcyberontology.org/uco/observable/File> <http://example.org/kb/NamedGraph-1> .

To confirm preservation, here is the round-tripping:

$ rdfpipe --output-format nquads named_graph.jsonld > named_graph.nq
$ rdfpipe --output-format json-ld named_graph.nq
[
  {
    "@graph": [
      {
        "@id": "http://example.org/kb/File-1",
        "@type": [
          "https://ontology.unifiedcyberontology.org/uco/observable/File"
        ]
      }
    ],
    "@id": "http://example.org/kb/NamedGraph-1"
  }
]

Last, the JSON-LD @graph key can be omitted, and the same "up-conversion" of a triples-graph into a quads graph with an anonymous identifier occurs.

{
  "@id": "http://example.org/kb/File-1",
  "@type": "https://ontology.unifiedcyberontology.org/uco/observable/File"
}
$ rdfpipe --output-format turtle one_triple.jsonld 

<http://example.org/kb/File-1> a <https://ontology.unifiedcyberontology.org/uco/observable/File> .

$ rdfpipe --output-format nquads one_triple.jsonld 
<http://example.org/kb/File-1> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <https://ontology.unifiedcyberontology.org/uco/observable/File> _:N07ab3e7bf23a4d0c97c08b0dfb9eb596 .

Footnotes

  1. Participation by NIST in the creation of the documentation of mentioned software is not intended to imply a recommendation or endorsement by the National Institute of Standards and Technology, nor is it intended to imply that any specific software is necessarily the best available for the purpose.

In the discussion last week, there was some question about the gradation of these statuses. Another interpretation of the "status" of an object could be "Everything is 'Draft,' until it is inconsistent."

There was an expressed desire for this to be an optional property. As proposed and sketched, it is.

This comment is posted as part of solutions discussion - posted early because it might have an impact on requirements.

Sean already posted a SHACL implementation, but it has a side effect of altering UCO's imports - core.ttl currently depends on no other ontology in UCO, but this proposal would make vocabulary.ttl an import dependency.

I think there is a further specification we will need to make in OWL before we consider the right SHACL approach - deciding whether "maturity status" should be represented as a class (well, three classes, from Sean's sketch), a datatype property, or an annotation property.

This comment explores the three options. My personal leaning is that Sean's initial suggested solution is not quite what should be implemented. At the least, I suggest instead of a owl:DatatypeProperty, the property should be an owl:AnnotationProperty.

Representing object status with classes

This draws design from OWL's owl:DeprecatedClass and owl:DeprecatedProperty.

UCO could add three mutually-disjoint subclasses of UcoObject:

core:DraftUcoObject
	a owl:Class ;
	rdfs:subClassOf core:UcoObject ;
	.
core:FinalUcoObject
	a owl:Class ;
	rdfs:subClassOf core:UcoObject ;
	.
core:DeprecatedUcoObject
	a owl:Class ;
	rdfs:subClassOf core:UcoObject ;
	.
[]
	a owl:AllDisjointClasses ;
	owl:members (
		core:DraftUcoObject
		core:FinalUcoObject
		core:DeprecatedUcoObject
	) ;
	.

SHACL could then be written the way we've handled other class disjointedness, such as how we're handling in Issue 586.

This has the following benefits:

  • This induces no import-dependency from core.ttl onto vocabulary.ttl.
  • This prevents designating an object status on a non-UcoObject. E.g. designating a types:Hash as having a core:objectStatus would not be caught by Sean's suggested implementation; but designating a types:Hash as a core:FinalUcoObject would trigger a SHACL validation error because hashes are not UcoObjects.
  • The pairwise disjointedness satisfies requirement 5.

Objects would still not be required to designate themselves in any of these "status" classes. (I.e., a status would remain "opt-in" through a type assertion.)

Using classes has a data-design implication that would nudge JSON-LD graphs further into needing to always have their @type JSON keys be array-valued. (Though, this isn't really new to this proposal. RDF always allows for multiple rdf:types to be assigned to an object.)

These classes shares a risk described further below in the "Annotation property" section, where there's a bit of muddying between the graph representing reality and the graph talking about itself. (As an aside for @plbt5 and/or anyone read up on gUFO: I think these classes are suggestive of both gufo:Categorys and gufo:Phases, consequently suggesting a meta-modeling divide that may eventually help the "archive" question left out of scope of this proposal.)

Representing with a property

Whether with an owl:DatatypeProperty or owl:AnnotationProperty, this string-valued property would be using an enumeration.

These implementation risks are present with any property-based approach:

  • The solution suggestion induces an import-dependency from core.ttl onto vocabulary.ttl.
  • The solution suggestion uses the semi-open vocabulary design pattern, which by accident with Requirement 5 would mean an end user would not be able to use, on one object, one of the suggested vocabulary members (draft/final/deprecated) AND their own (e.g., draft-imported). This could be an interoperability or mapping issue if two organizations share data, one needing to use the values in UCO, the other needing to use their own custom values. The class-based approach sketched above bypasses this issue.

Representing with a datatype property

This is as Sean initially proposed.

If using a datatype property, Sean's sketched proposal suggests this analogous OWL specification:

core:UcoObject
	rdfs:subClassOf [
		a owl:Restriction ;
		owl:onProperty core:objectStatus ;
		owl:cardinality "1"^^xsd:nonNegativeInteger ;
	] ;
	.

Speaking to the open-world view, this means there exists exactly one "object status" for any UcoObject. It's fine if the graph doesn't say what it is. The worldview is inconsistent if two are ever found on the same object.

Representing with an annotation property

I'm mentioning this because "maturity status" feels like one of those properties that makes vague the distinction between the graph object and the real-world thing proxied by the graph object. This is related to two points in modeling languages UCO has as foundations:

First is the discussion (under construction) in Issue 606, which re-introduces "Information resource" and "Non-information resource" from some of the early modeling in RDF.

Second is this distinction from OWL, explained in the OWL Primer:

Object properties relate objects to objects (like a person to their spouse), while datatype properties assign data values to objects (like an age to a person). Annotation properties are used to encode information about (parts of) the ontology itself (like the author and creation date of an axiom) instead of the domain of interest.

If a graph node is defined for a person (kb:Person-22342929-b537-4bad-afea-3241facfdb5f), and we assert that the maturity status is 'draft' (kb:Person-22342929-b537-4bad-afea-3241facfdb5f uco-core:objectMaturity "draft" .), we are not calling the person "in draft"; we are calling the conceptualization of the person "in draft". This is, by OWL design, an annotation.

One implementation implication for annotation properties is that they cannot be restricted in OWL class design. I.e., that owl:Restriction for cardinality I suggested above in the owl:DatatypeProperty discussion is not syntactically permitted in OWL. (This can be confirmed by looking at the OWL 2 mapping to RDF document, searching for owl:onProperty in the page, and seeing there is never a usage with an annotation property.)

While this is relevant for an OWL-specific decision on which property-type to use, this happens to not be relevant for SHACL, because SHACL property constraints are indifferent to the distinction between, owl:DatatypeProperty, owl:AnnotationProperty, and even rdf:Property---they can all be constrained equivalently.

My belief is that objectMaturity, if to be taken as a property, is an annotation property. This belief happens to extend to other things UCO currently represents as owl:DatatypePropertys, such as core:objectCreatedTime, but those would need to be addressed in separate proposals.

I think the above comments around "makes vague the distinction between the graph object and the real-world thing proxied by the graph object" are overly simplifying the true situation.
In actuality there are three distinct layers in play: 1) the ontology graph of objects defining concepts and their relationships, 2) the data graph of objects each of which is an instance of a concept defined in the ontology graph and represents a characterization of some "real-world" thing, and 3) the real-world thing characterized by one or more data graph objects.
For example, there may exist a Person concept defined as an ontology graph class object, there may be a "real-world" person John Smith, and there may be an Individual object in an rdf data graph that is of type Person and characterizes the "real-world" person John Smith.
All objects in the ontology graph are definitions of "real-world" concepts but do not represent "real-world" instances of those concepts other than Individuals predefined in the ontology (though these are not data graph objects).
All objects in the data graph align to concepts defined in the ontology but themselves are characterizations of "real-world" things and all properties of objects in the data graph are properties of the data graph object characterization of the "real-world" thing NOT properties of the "real-world" thing themselves.
I do not believe there is any ambiguity or vagueness in this distinction.

While owl:AnnotationProperty defines annotation properties as properties for characterizing classes, properties, individuals, and ontology headers within an ontology definition (seeming like they are only defined and "set" within the ontology graph) the OWL spec does specifically list five annotation properties predefined by OWL: owl:versionInfo, rdfs:label, rdfs:comment, rdfs:seeAlso, rdfs:isDefinedBy. Several of these are very commonly used within RDF data graphs as properties on objects.
I would agree that core:objectStatus does seem likely more appropriate as an annotation property (similar to rdfs:label) than as a datatype property.
As discussed in the comment above this should not prevent appropriate application of cardinality or vocabulary value constraints as SHACL does not suffer from the restrictions limitations on annotation properties that the OWL graph does.

I would vote against a class-based approach for this issue as it makes everything significantly more complicated: 1) adding new values in the future would involve class-level semantic structure rather than simply adding new values, 2) eventual solutions for tracking changes over time would require semantic complexities rather than simply a literal value change of a property, 3) any implementations of labeled property graph representations or UIs for setting/changing/checking object status would be significantly more complicated, etc.

I do not believe it offers any significant value to outweight the added complexity it brings.
It would be straightforward to avoid the core - vocabulary import dependency by defining core-relevant vocabularies (such as this one) locally within core. Core is already a somewhat special case as the fundamental core of the UCO ontology and this should not be a stretch to do.
The asserted issue of being able to designating objectStatus as a property on a non-UcoObject is no different to basically all properties in CDO. We can use SHACL to assert the same sort of safeguards as we do elsewhere.
The pair-wise disjointedness can also be accomplished using a vocabulary of literal values by simply adjusting the SHACL shape to vet for disjointedness among vocab values though this should not be an issue if cardinality of the property is simply set to 0..1. I would assert this is the correct approach anyway. The idea is for UCO/CDO to define a single maturity status set of values where any UcoObject can only be at one of those maturity statuses at a time. If some user wished to define their own statuses in addition to the standard UCO/CDO ones they should define their own property and vocab for it and not conflate their self-defined values in to the standard set. If we allow conflation it has the risk of very much muddying the waters where consumers of CDO content either cannot tell or have to work harder to tell which values are standardized and which are not.

Net-Net:
Based on Alex's excellent comments above regarding annotation properties I would propose the following adjustments to the CP here:
Within the core namespace:

  1. Add new core:ObjectStatusVocab vocabulary ('Draft', 'Final', 'Deprecated')
  2. Add new core:objectStatus annotation property
  3. Add new associated property shapes on core:UcoObject for core:objectStatus as an xsd:string literal with cardinality max=1

I had an extensibility concern with the first draft of this proposal: the semi-open vocabulary design pattern UCO uses can't support a value that specializes Draft while intending to still mean Draft, e.g. Draft, not reviewed by 2nd analyst.

From @sbarnum 's comment, I see the intent around ObjectStatusVocab is (now, if not before) to be a fixed vocabulary, not a semi-open vocabulary.

Sean has suggested another extension strategy for maturity statuses: anyone wishing to use object-maturity statuses outside the UCO-supplied vocabulary would need to use their own set of strings and their own property.

Together, these are one way to address my extensibility concern.

Classes also would have addressed my extensibility concern, but I appreciate they carry other issues.

I have no remarks on the labeled property graph issues Sean alluded to.

At this point in the discussion thread, I think the Requirements on this proposal don't need to change.

Re:

The asserted issue of being able to designating objectStatus as a property on a non-UcoObject is no different to basically all properties in CDO. We can use SHACL to assert the same sort of safeguards as we do elsewhere.

I intend to add to the implementation a constraint that the object maturity status property only ever go on core:UcoObject. This is consistent with "UcoObject" appearing throughout the requirements list, and satisfies my whoopsie-catcher of trying to put objectStatus on, e.g., a core:Facet.

core:objectStatus-subjects-shape
    a sh:NodeShape;
    sh:class core:UcoObject ;
    sh:targetSubjectsOf core:objectStatus ;
    .

I think this is more a discussion point for the solutions period than the requirements period, but I see a slim chance it influences requirements discussion today. Do we want to have a limit on where this property can be used?

I would agree with a constraint that the property is only used on UcoObject.

In drafting another proposal, I arrived at another possible status.

Would the structure of this proposal be able to support an object that is rejected? Or would we need some other practice that rejects an object by marking it "Deprecated" and then doing other things?