mimiro-io/datahub

MultiSource: when querying back in time, consider all versions of a changed entity

Opened this issue · 0 comments

Currently, when multiSource processes a change found in a dependency, it also does a query back in time to find the last version of the changed entity. This older version can in some cases have had a reference back to our main dataset, which therefore requires the previously connected main entity to be emitted.

In some cases, there may have been many changes to an entity in a dependency dataset since the last job run, and any of the intermediate versions may have had relevant references. Therefore the source implementation should query back in time for all of there potential versions as well.

image

The sketch drawing illustrates 6 changes to a given entity in a dependency dataset over time, with their respective references to entities in the next joined dataset.

In this example we want E1, E2 and E3 emitted. The current implementation only catches E1