speedb-io/speedb

Trimming most recent timestamp history

Opened this issue · 0 comments

Owner:

Is your feature request related to a problem? Please describe.
When using timestamps, we have a use case where we have to remove the most recent history (the keys with timestamps above a threshold). The key should not be marked as deleted, but everything should happen as if the version above the threshold were never inserted (making the values with lower timestamp visible again). And this has to be able to be done atomically with other operations to keep a consistent view of the DB while it's used in multiple threads (the writebatch is ideal for that).

// Create database with timestamp
rocksdb::DB * database = ...;

// Add history
database->Put(rocksdb::WriteOptions(), "key", "timestamp1", "value1");
database->Put(rocksdb::WriteOptions(), "key", "timestamp2", "value2");

// Reading back with newer timestamp gives us the most recent value
database->Get(rocksdb::ReadOptions{.timestamp = "timestamp3"}, "key"); // == "value2"

// Delete the last entry, but don't delete the key itself (wanted feature)
database->DeleteAt("key", "timestamp2");

// After that the previous value becomes visible again
database->Get(rocksdb::ReadOptions{.timestamp = "timestamp3"}, "key"); // == "value1"

Describe the solution you'd like
I want to be able to delete a specific set of keys + timestamps in a write batch (to be able to atomically delete several such combinations), but only deleting the specific history, not marking the key itself as being deleted.
It's also fine if it's a call in the write batch to delete all key with timestamps above or equal a certain value in timestamp enabled columns, but I need to be able to do regular operations on columns without timestamp at the same time.

Describe alternatives you've considered
As far as I know, there is no way to do that. We can't stop the database to call OpenAndTrimHistory. The solution we use for now is to copy the values of previous timestamp to newer timestamp, but that leads to side effects that need to be handled (the versions are not encoding changes anymore, but could have duplicates, and we have values written for timestamp above our current "official" timestamp, which can lead to hard to track bugs -- what if i "erase" my history from timestamp 3 to timestamp 1, then write at timestamp 2 → my value at timestamp 3 and above will not be what I expect -- If I use delete instead I have a different set of issues). Which also leads to wasted disk space.