akka/akka-persistence-dynamodb

Use lastEvaluatedKey?

Opened this issue · 3 comments

We could look into using response.lastEvaluatedKey and use that as exclusiveStartKey in query, instead of the timestamp for subsequent queries. Not sure how that works with GSI where the sort key isn't unique (same timestamp). If DynamoDB can keep track of the exact offset and not emit duplicates would not need the seen Map and that filter. Well, we still need it for the first query because we want the external offset to be TimestampOffset and that can include seen Map.

There is a corresponding FIXME for this.

Had a closer look at this, for a different reason. The paginated queries already use last evaluated key, so wanted to make sure that it wouldn't skip any events if it was the same timestamp at a page boundary. The last evaluated key for secondary indexes includes both the primary key from the index and the primary key from the base table. While it doesn't use the base table sorting, it looks like it sorts based on some hash of the table primary key — so it's a deterministic order, and will be able to do an exclusive start even if the index primary key is the same. So looks like we could use this in place of starting inclusive of the last timestamp and deduplicating.

Good to know. Is it worth changing? We kind of have to handle two ways if we use lastEvaluatedKey as mentioned above:

we still need it for the first query because we want the external offset to be TimestampOffset and that can include seen Map.

Maybe we just keep it as a nice to have future improvement? The cost benefit would be loading one less item in each such running query?

Maybe worthwhile, to simplify the implementation side. Could always be used in place of full seen map — if it becomes an offset that includes the full key (index and base table primary key) and this is stored. Could be encoded in TimestampOffset — would be a single entry in the seen map, with the pid and seq nr for exclusive start key.

A similar approach could also be used for r2dbc. We already extended the queries to start from seq nr for the initial timestamp (for the 1000 events issue). That was also looking to be schema compatible. If we updated the sorted index to include pid as well, then could do an exclusive start query there as well.

Maybe points to some abstraction leakage, if the offset can't be specific to the implementation. Also with needing to add TimestampOffsetBySlice in core library. But we could use the current TimestampOffset, with only a single entry in the seen map for exclusive start.

Makes sense as a possible future improvement. Can try it out later. And could also try with r2dbc.