casework/CASE

`InvestigativeAction`s should be required to produce at least one `ProvenanceRecord`

ajnelson-nist opened this issue · 5 comments

Background

Discussion on CASE Issue 136 suggests that an InvestigativeAction should always result in the creation of at least one ProvenanceRecord.

Requirements

Requirement 1

CASE should enforce that an InvestigativeAction results in at least one ProvenanceRecord.

As an implementation note, this would be done with a qualified SHACL constraint.

Edited 2024-02-15: "Must" relaxed to "should".

Requirement 2

CASE should describe in a mechanically discoverable way that an InvestigativeAction is expected to always result in at least one ProvenanceRecord.

As an implementation note, this would be done with a qualified minimum cardinality in an OWL Restriction.

Risk / Benefit analysis

Benefits

  1. Requiring a ProvenanceRecord always be generated induces a chain of custody tie in forensic processing for resultant objects of InvestigativeActions.
  2. Reintroduction of OWL constructs will assist with OWL-specific review mechanisms that do not appear to be possible in SHACL, such as set-satisfiability (e.g. determining through set-theoretic analysis whether a class or restriction has accidentally ended up equating to the empty set, rendering usage conformant with the specification impossible).
    1. This is acknowledged to be a broader issue than this one proposal. However, a minimum cardinality restriction appears to the submitter to be a "safe" reintroduction in terms of complexity.

Risks

  1. Existing SHACL shapes require a ProvenanceRecord always have one member UcoObject. Thus, this proposal would induce a significant requirement on InvestigativeActions: They must always result in something aside from the ProvenanceRecord.
    1. Note that an object being a result of an action does not necessarily imply that the object was created by the action. This stemmed from discussion on UCO Issue 558.
    2. It is possible the definition of ProvenanceRecord is too stringent. It is somewhat a separate concern that there might exist a class of InvestigativeActions that truly have no results. Perhaps: "This action found all files within this directory. There were none."
    3. NOTE: Risk 1 mitigated with resolution of UCO Issue 599. ProvenanceRecords may now be empty.
  2. Some Actions might be desired to be defined in a manner that attempt to restrict the results to a specific class, e.g., IP addresses. If such an action-class were introduced, it could never be an InvestigativeAction, because an InvestigativeAction would be required to include a ProvenanceRecord among its results. Hence, this proposal would end up inducing an upstream design constraint on UCO: action:result can never be constrained with owl:allValuesFrom, because UCO doesn't "know" about case-investigation:ProvenanceRecord.
  3. This proposal does not specify whether there must only be one ProvenanceRecord among the results. This is an inconclusive point from the discussion on CASE Issue 136, and could be affected depending on whether the committee decides a subaction's ProvenanceRecord should also be recorded in the parent action's results.
  4. This proposal suggests restoring OWL practices, starting with a description of at least one of the outputs for any InvestigativeAction. CASE and UCO previously abandoned OWL in UCO 0.7.0 / CASE 0.5.0. This proposal starts a disciplined reintroduction of OWL constructs, testing with the UCO-OWL syntax review mechanisms.
    1. UCO Change Proposal 23 housed discussion, though it appears that document was not exported from the access-controlled UCO Confluence space. (I don't think there is a reason it wasn't, aside from document exports only becoming a mandated part of the proposal process in later releases.)
    2. A test focused on the syntax used will be added in a separate proposal to UCO.
  5. Due to needing SHACL qualified shapes, the CASE testing infrastructure also needs to require pySHACL >= 0.24.0, which incorporates a resolution to pySHACL Issue 213.
  6. (Added 2024-02-15.) In information sharing situations, some data might be restricted from being shared or alluded to, e.g., from legally imposed redactions. If Org1 shares part of a graph with Org2, and includes some InvestigativeAction for, say, its timing and tool-use relevance, but doesn't share the identifier for the generated ProvenanceRecord, the shared data should by itself still be conformant to UCO, and should not impose UCO validation errors when folded into the receiving organization's knowledge base.

Competencies demonstrated

Competencies are omitted from this proposal, as the effects are new restrictions on data, and hence do not enable new expressive abilities.

Solution suggestion

For CASE 1.x.0, add the following to investigation.ttl:

investigation:InvestigativeAction
	rdfs:subClassOf [
		a owl:Restriction ;
		owl:onProperty uco-action:result ;
		owl:onClass investigation:ProvenanceRecord ;
		owl:minQualifiedCardinality "1"^^xsd:nonNegativeInteger ;
	] ;
	sh:property [
		sh:message "An InvestigativeAction should have a ProvenanceRecord among its results.  This will be a requirement in CASE 2.0.0."@en ;
		sh:path uco-action:result ;
		sh:qualifiedMinCount "1"^^xsd:integer ;
		sh:qualifiedValueShape [
			a sh:NodeShape ;
			sh:class investigation:ProvenanceRecord ;
		] ;
		sh:severity sh:Warning ;
	] ;
	.

For CASE 2.0.0, remove the sh:message and sh:severity triples from the added sh:PropertyShape.

Coordination

  • Administrative review completed, proposal announced to Ontology Committees (OCs) on Jan. 26, 2024
  • Requirements to be discussed in OC meeting, date Feb.15, 2024
  • Risk 1 addressed - InvestigativeActions that have no non-ProvenanceRecord results confirmed supportable.
  • Requirements to be discussed in OC meeting, date TBD.
  • Requirements Review vote has not occurred
  • Requirements development phase completed.
  • Solution announced to OCs on TODO-date
  • Solutions Approval to be discussed in OC meeting, date TBD
  • Solutions Approval vote has not occurred
  • Solutions development phase completed.
  • Backwards-compatible implementation merged into develop for the next release
  • develop state with backwards-compatible implementation merged into develop-2.0.0
  • Backwards-incompatible implementation merged into develop-2.0.0 (or N/A)
  • Milestone linked
  • Documentation logged in pending release page
  • Prerelease publication: CASE develop branch updated to track UCO's updated develop branch
  • Prerelease publication: CASE develop-2.0.0 branch updated to track UCO's updated develop-2.0.0 branch

While I agree with this proposal in intended spirit I do not feel it is viable due to Risk 1 and Risk 2 above.

I do not believe either of these risks can be ignored in favor of the intent of this proposal.

I believe that Risk 2 is real and could have a significant impact if ignored.
I believe that Risk 1 is very real and WILL have a critical impact if ignored. There are certainly investigative actions that could have no result.

We can say that an InvestigativeAction SHOULD have at least one ProvenanceRecord but we cannot say MUST.

While I agree with this proposal in intended spirit I do not feel it is viable due to Risk 1 and Risk 2 above.

I do not believe either of these risks can be ignored in favor of the intent of this proposal.

I believe that Risk 2 is real and could have a significant impact if ignored. I believe that Risk 1 is very real and WILL have a critical impact if ignored. There are certainly investigative actions that could have no result.

We can say that an InvestigativeAction SHOULD have at least one ProvenanceRecord but we cannot say MUST.

More on Risk 1:

I'm more inclined to review and revise that minimum-count 1 SHACL rule on ContextualCompilation. This is not the first place that has caused an issue: the experimental extension ontology in CASE-Corpora is trying an alignment between DCAT-US (in short, a model for datasets) and CASE+UCO. Some things under DCAT-US looked like philosophic kindreds to ContextualCompilation, but would at times be appropriately empty (e.g., datasets with distribution files, but not publicly available distribution files). The sh:minCount 1 rule inherited from ContextualCompilation calls that a data error. So there is some subclassing in that repository that feels ...contortive.

I'm glad you and think it is appropriate to represent investigative actions that have no non-provenance-record results. I think it's a little strange-feeling to have a provenance record with no members as the sole result of an investigative action, but it isn't necessarily wrong. For instance, it could be a further sanity check down stream in CASE analysis if that "empty" provenance record were used by a later investigative action and nothing in the (empty) provenance record was also an input to that same investigative action. (This is inching out of scope of this proposal, but my gut's saying that's a sanity check I would be grateful to have; it sounds like it would catch copy-paste errors stemming from copying the wrong thing.)

I think Risk 1 is solely from ContextualCompilation having used SHACL for its minimum member count description instead of OWL. A SHACL minimum-1 count, anywhere, induces validation failures for incomplete information, so it is a construct that must be used sparingly. Should a UCO graph fail validation because it named a set (ContextualCompilation) but said nothing of its members? This is a bigger question for data sharing, which I'm noting here because this might be another risk specific to this proposal. Here's an example:

If Org1 shares part of a graph with Org2, and includes some InvestigativeAction for, say, its timing and tool-use relevance, but doesn't share the identifier for the generated ProvenanceRecord, should that shared data fail validation?

After discussion on this morning's call, it is likely that that spelling change for ContextualCompilation will be proposed.

From discussion on this morning's call, we felt the risks (including the one realized just prior to the call on information sharing) left us uncertain the requirements are sufficiently captured. We will return to this after proposing at least one upstream matter on UCO to address Risk 1.

The proposal has received some revisions (accompanied by string "2024-02-15"), and an extra step in its coordination checklist.

Risk 1 has been addressed with the resolution of UCO Issue 599.