Add versioning metadata to the Collection
Closed this issue · 1 comments
Publishes a dataset often entails adding some additional statements (e.g., dct:isVersionOf), but this data then becomes indistinguishable from the original data elements. This can be an issue for operations such as version materializes, which should yield the current version of each concept, as it would appear without this LDES-specific metadata.
The specification currently describes how to add a version key to a retention policy, but this cannot be used for collections without a retention policy. Furthermore, the version key only specifies which predicate is used to link the version URI to the concept URI, but there's often another predicate that is used to assign a timestamp to this version. This timestamp metadata would also be useful for other issues, such as #16.
I would propose to move the versioning metadata (the version key and timestamp predicate) to the Collection description, and possibly make it mandatory. Perhaps it can become part of the shape description.
I agree with the suggestion to define this on top of the LDES entity. I think it should be a best practice to define it, but it should not be a requirement though. When defining it, you get a lot of benefits, but functionally everything could keep working without it being described.
Other use cases
This information would be needed to understand:
- how an automatic version materialization could work (related: TREEcg/event-stream-client#13)
- how a version-based retention policy would know what the “last” version is (related: #16)
- upon what property the members in 1 page need to be ordered to understand how the collection grows, and thus also what member in the page can be automatically discarded when processing such a page again (this is a use case currently hard-coded in the Event Stream client: https://github.com/TREEcg/event-stream-client/blob/31444acb768639745d219b286e050e002f7f38d1/packages/actor-init-ldes-client/lib/EventStream.ts#L209)
Suggestion
ex:ES1 a ldes:EventStream ;
tree:shape <...> ;
ldes:versionOfPath dcterms:isVersionOf ;
ldes:timestampPath dcterms:created .
ldes:versionOfPath
is a property path (Shacl) to the property that will indicate a URI of the non-versioned objectldes:timestampPath
is a property path (Shacl) to the timestamp
Effects
Retention policy
This part of the spec won’t really change:
A version-based retention policy can be defined based on the original collection’s data, but can also be overwritten in the policy itself. The policy itself can also have the property ldes:versionKey
which is an rdf:List
of object identifier paths indicating that they must be combined. This is particularly useful in e.g. the use case of sensor data to indicate the last 5 sensor observations of a sensor’s observed property (ldes:versionKey ( ( sosa:observedProperty ) ( sosa:madeBySensor ) ) .
).
Version Materializations spec proposal
An official version materialization can be defined only if the original LDES defines both ldes:versionOfPath
and ldes:timestampPath
A version materialization replaces the subject of a member with its ldes:versionOfPath
IRI, and filters the data to match a certain version identifier, or to select the latest version of the members until a certain version literal.
A version materialization thus converts e.g., an LDES like this:
ex:ES1 a ldes:EventStream # + proposed metadata see ↑
tree:member [
dcterms:isVersionOf <A> ;
dcterms:created "2020-10-05T11:00:00Z"
owl:versionInfo "v0.0.1";
rdfs:label "A v0.0.1"
], [
dcterms:isVersionOf <A> ;
dcterms:created "2020-10-06T13:00:00Z";
owl:versionInfo "v0.0.2";
rdfs:label "A v0.0.2"
].
towards
ex:ES1v1 a tree:Collection ; # the members are no longer immutable
ldes:versionMaterializationOf ex:ES1 ;
ldes:versionMaterializationUntil "2020-10-05T12:00:00Z"^^xsd:dateTime ;
tree:member <A> .
<A> rdfs:label "A v0.0.1" .