ActivityStreams-based retention policies for removed entities
Opened this issue · 2 comments
Some back-end systems are able to expose an event stream of last updated items, but don’t keep track of things that have been deleted. In that way, a retention policy should exist that describes the fact that the LDES conceptually does contains the as:Remove
activity, but that it hasn’t been included, yet can be inferred from the fact that the earlier included as:Create
is not anymore part of this
Use cases:
- DCAT-AP Feeds in which catalogs just want to provide a dump of their current state
- Explicitely describing what happens in the IIIF Change Discovery level 0: https://iiif.io/api/discovery/1.0/#level-0-basic-resource-list
We could also have type-based retention policies. E.g.: we only keep members typed Remove for 1 year.
Proposal:
1. An ldes:ImplicitRemovalPolicy
This policy says that a consumer will have to infer removals from the fact that something isn’t available on a follow-up traversal of all members in a view.
<> ldes:retentionPolicy [
a ldes:ImplicitRemovalPolicy ;
ldes:type as:Remove
].
2. Adding type filters to other retention policies
By default, the retention policies are maximized. When at least one retention policy is explicitly defined, the view promises to retain the AND of all policies. This makes this example tedious, as we need an OR: either we keep the removal for 1 year, either the object still exists and then we keep the last version until it would be removed, then we keep the removal for 1 year...
<> ldes:retentionPolicy [
a ldes:DurationAgoPolicy ;
tree:value "P1Y"^^xsd:duration ;
ldes:type as:Remove
],[
a ldes:LatestVersionSubset ;
ldes:amount 1 ;
ldes:exceptType as:Remove
] .
Adding ldes:type to a retention policy could restrict the retention policies to only objects of a specific type.
However, the difficulty here is that this must be interepreted as an AND, and therefore the first retention policy will never be applicable, as the remove will always be the latest version of a thing that we promise to keep. Therefore we should introduce an exceptTypeclause, stating that the last version retention policy is not applicable to things of type as:Remove
I think what we really want to achieve is a retention policy that holds latest state (1 version) + time, allowing clients to see all writes.
Related: #36
We could get some inspiration from the semantics in Kafka, as they handle the same problem:
https://docs.confluent.io/kafka/design/log_compaction.html#compaction-enables-deletes
We could also consider this as a change to the version retention semantics; when the latest version is a tombstone, all versions are deleted?