goldmansachs/reladomo

Tool to roll back entire database to an older processing date?

Opened this issue · 2 comments

motlin commented

This is more of a question than a feature request.

  • Is there a tool to roll back an entire database to an older processing date?
  • If not, how should I go about implementing it? Are there Reladomo apis that I ought to be using?
  • Does this idea make sense?

I have an application where all tables are unitemporal. Each day, I get a file representing what the current data ought to be, and I have an application which transforms the data into the right form, converts it into Reladomo lists, and then uses DelegatingList#merge() to update the database.

While actively developing the application, I find it helpful to manually test by running the new code on one day's worth of data. In order to run this sort of test, I take a backup of the entire database and restore it each time I want to test the application.

It occurred to me that it would be helpful to have a tool that's aware of processing date that can roll back all the data to a previous state. It seems like it could be faster than my manual approach. Plus I sometimes forget to take the database backup.

I think the implementation would be something like:

  • Iterate through every temporal table
    Delete all rows where the rollback date is within [in, out)
  • Delete all rows where rollback date < in
    Update systemTo to be infinity for all rows where out < rollback date
  • Update systemTo to be infinity for all rows where the rollback date is within [in, out)

If nothing like this exists but it makes sense to build, I'd be happy to work on it.

Thanks in advance!

[Sorry, this issue seemed to have fallen between the cracks]

I would be very careful formalizing this for a unitemporal usecase. In theory, the operation requested here is "illegal", that is, it cannot happen for real, production data. A "proper" rollback would create new entries in the chain, not a nuke of old entries.

Because this is not within the normal set of operations, I don't think it can be done efficiently with the existing API. The delete is straight forward (purgeAll on a list should do). The bulk update of processingDataTo should not work on a list object (but it may, as I don't recall any error checking around that).

For very large databases, this work may not be good to do in a single transaction (per table).

Thanks for the answer, I will try it out.