ipfs/go-datastore

Define Delete to be idempotent

Closed this issue · 15 comments

Moving from: ipfs/go-ds-flatfs#30

  1. We usually only care if the operation succeeded.
  2. We definitely don't want to abort a batch operation because a delete failed.
  3. User code often just aborts/returns errors without thinking about them.
  4. Not all datastores can tell us if there was nothing to delete (e.g., badger).
  5. Flatfs no longer reliably tells us this.
  6. HTTP delete is defined to be idempotent so many web-backed datastores won't be able to fulfill this contract either. (also, there's a reason HTTP delete is idempotent...).

This boils down to:

  1. We can't reliably return ErrNotFound on delete making it useless except, maybe, as a source of metrics.
  2. Having to handle ErrNotFound on delete in user code is annoying.
  3. Having to try to return ErrNotFound on delete is annoying at best.

Proposal: Define it to be idempotent (i.e., return nil when the key isn't found).

For example, see: https://github.com/ipfs/go-datastore/pull/103/files#diff-42ea5667b327bb207485077410d5f499R42. I could fix that, but it would require a bunch of extra state I don't really want to track (and it would cause a failed delete to fail the entire batch.

I don't have a strong objection. I would prefer a way to communicate back that a key is not found if that information is known, but as this may complicate things, this is not a hard requirement.

The user can still call Has first. I agree that exposing all information we know is a good idea, it just makes life hard in practice (in this case).

raulk commented

Generally, I agree. DELETEs are idempotent in SQL, leveldb, badger, redis, but some databases will report the number of keys deleted. Maybe we should add capability for returning an (int, error), where the former can take value -1 if the DB does not support this feature?

For the record, go-ds-badger does not return ErrNotFound on deleting an inexistent key; it simply forwards the error from badger, which does not complain in this case.

  1. We definitely don't want to abort a batch operation because a delete failed.

In most of cases, yes. But ultimately this depends on the application logic. For example, in the Datastore TTL shim I've half-baked, we have two keys per TTL entry (for good reasons), and we might want deletes pertaining to a single TTL entry to be as atomic as possible.

In most of cases, yes. But ultimately this depends on the application logic.

Ah, sorry, I meant we don't want to abort if the delete failed because the item wasn't there. We probably do want to abort on other errors.

Maybe we should add capability for returning an (int, error), where the former can take value -1 if the DB does not support this feature?

So we'd return:

  • 1 - We deleted something.
  • 0 - Nothing happened.
  • -1 - No idea.

(where we'd only return an error if something went wrong)

Personally, I'm not sure if implementing that is worth the work/complexity. Making DELETE idempotent is quite easy:

  1. Slowly remove all cases where we return ErrNotFound on delete. We're already failing to do this in some cases so this won't break anything.
  2. Eventually, define DELETE to be idempotent.
  3. Finally, remove all checks for ErrNotFound on delete.

Basically, we can do this slowly without having to fix everything up-front. Changing the method signature, on the other hand, will take a bit more work.

We could still do it, I just want to make sure the change is well motivated (i.e., we have a use-case where we actually care about this). If we do run into a case where we'd like to know, we can change the signature later (well, it'll be a breaking change but it'll be a breaking change either way).

@Stebalien I agree this is over-complicating things. I think the best to not worry about for now. A better way to handle this is to query the datastore afterwards for the status of the last operataion, but this will require the use of handles, which is something we may want to do anyway down the road.

bigs commented

i like @raulk's suggestion about returning the number of keys deleted... DBs that support versioning (badger) could potentially delete more than one key. so i'd think of it as <0, 0, >0

@raulk @kevina @bigs so, are you fine making delete idempotent (for now) where, sometime in the future, we could consider adding a delete count? Really, I'm just trying to (a) simplify some code that tries to hard to track whether or not the operation succeeded and (b) trying to nail down semantics (I hate undefined behavior).

@Stebalien that fine with me.

bigs commented
raulk commented

Idem. I have no immediate use case right now for the delete count.

SGTM, in my book Delete just means that following calls to Get are the ones that will actually return the ErrNotFound error, no guarantees of what was there before issuing the call.

I was bitten by this one. In particular, the Delete function of MapDatastore in this repository is not idempotent. As it is often used to implement a test suite, it also force in some situation to deal with this case.

Another one not idempotent is flatfs.

All done!