apache/pekko-persistence-jdbc

Journal does logical and physical deletion in same transaction

Closed this issue · 12 comments

DefaultJournalDao.delete(persistenceId: String, maxSequenceNr: Long) does both logical and physical deletion in the same transaction. We did some load-testing recently (with PostgreSQL) and found that these operations both used a lot of resources.

There's probably a case for doing logical deletion in terms of performance, but the way it's implemented now, the logical deletion doesn't make sense, and should probably be removed.

DefaultJournalDao.delete

@mdedetrich probably not worth delaying the forthcoming RC for this. I'm on a day trip so can't check in detail. My gut reaction is that looking to improve the performance of the deletes is better than removing them. Maybe a missing db index. OP has not provided any details to back up the claims.

@mdedetrich probably not worth delaying the forthcoming RC for this. I'm on a day trip so can't check in detail. My gut reaction is that looking to improve the performance of the deletes is better than removing them. Maybe a missing db index. OP has not provided any details to back up the claims.

If by RC you mean the -M1 we plan on doing then yes I agree.

@magnho Is my understanding correct in that if we properly solve this issue it can be done without breaking API and your recommendation of removing .delete is just to prevent users from calling it since its misleading?

@mdedetrich I think he want to remove the logical deletion(this is a legacy issue), In fact, we also encountered this perf problem in production, which caused me to give up the jdbc plugin and use we own jdbc implementation instead.

@mdedetrich Sorry if I was unclear. The .delete is part of the JournalDao API, so that has to remain. I'm just questioning the implementation in DefaultJournalDao.

It currently works like this:

  1. Does an update marking all entries up to some sequence-nr as deleted=true in the database.
  2. Does a select to get the highest sequence-nr where deleted=true in the db.
  3. Performs the actual delete-query up to the sequence-nr it got in step 2.

This is all done in the same transaction, so the result of 1. has no effect outside of this transaction. This makes step 1 and 2 redundant.

Understood, in that case we can do it after the -M1 release as it won't break any API

@magnho It also goes without saying that you are free to fix this yourself and contribute a PR. By the time we get to releasing -M1 you might already be done as this doesn't seem to be a significant change, you just have to move logic from the logic delete into the actual delete query

@mdedetrich probably not worth delaying the forthcoming RC for this. I'm on a day trip so can't check in detail. My gut reaction is that looking to improve the performance of the deletes is better than removing them. Maybe a missing db index. OP has not provided any details to back up the claims.

Not advocating for removing the deletes, but very happy to get a discussion going here. The idea was that either the logical delete should be removed, or the logical delete is kept and the physical delete is moved. I don't have any clear suggestion for how the latter would be done.

@Roiocam would you have time to do a PR?

I don't see why logical delete should be default, although I may be missing something. Having said that I am getting the increasing feeling that this should be configurable? i.e. you configure persistence-jdbc whether to do logical delete or physical delete (and if you configure logical delete then all of the query operations also need to take into account the deleted=true flag).

On second thoughts we should release -M1 as is without this change because even though this change may not be breaking in strict API sense and changing it to logical delete shouldn't have any effect, it is a change in behaviour and a lot of people are going to be jumping onto 1.1.0-M1 straight from an existing setup for Slick 3.5.x/Scala 3.

In other words too many big changes in for 1.1.0-M1, should spread it out to get user feedback. Will set milestone but we can re-discuss if needed

The feature to enable logical deletion was already deprecated and removed in akka/akka-persistence-jdbc#569

The deleted flag though is still used to ensure the last sequence number isn't lost when deleting all events.
To get rid of that low level logical deletion, the plugin would have to keep a metadata table of the last sequence number of each persistence id similar how persistence-cassandra does it.

The other option to keep the deleted flag, we could optimize the delete by deleting up to n - 1, then mark the event n as deleted. That would avoid updating all rows before deleting them.

The deleted flag though is still used to ensure the last sequence number isn't lost when deleting all events. To get rid of that low level logical deletion, the plugin would have to keep a metadata table of the last sequence number of each persistence id similar how persistence-cassandra does it.

Thanks for the input, that's help!

The other option to keep the deleted flag, we could optimize the delete by deleting up to n - 1, then mark the event n as deleted. That would avoid updating all rows before deleting them.

I will finish this on #169