holepunchto/hyperbee

Real deletes?

Closed this issue · 10 comments

An append-only b-tree, but the API has a method for deletes. How do you achieve deletes when the underlying datastructure is append-only?

You append a change that stops linking to the data you wanna delete 😊

You can then gc the deleted block from the log afterwards if you wanted. Only thing left then is the hash of the block. If you delete the neighbouring block, then only the parent hash would be left etc

Can peers still download that block? Can you identify peers who are still exchanging that block?

I guess what I'm asking is, does db.del() do what you suggest to achieve deletes, or does it simply append another block requesting clients consider another block deleted?

Nuhvi commented

Blocks (analogous to items in an array) don't have the concept of deletion. Deletion in HyperBee is for Nodes of the Btree. And you are correct, it is just an operation saying this key is deleted now, but the history of all 'put' and 'del' operations are still available locally and to all the peers that downloaded that history.

True deletion in p2p systems is impossible, and arguably it is impossible on all the internet, but when all entries are signed by your key you can't even deny that you once wrote it.

"true deletion" is a complex misdirection of what we can and can't achieve in P2P systems. Yes it's impossible to modify an append-only log, but it's possible for swarms to die or cease, causing archives to effectively disappear. I don't think it's ethical to provide methods like .del() while deleted blocks continue to replicate. That's not just "no true deletion" but worse deletion than you can get via HTTP federation, much less a centralized service. I know it takes being clever to do better than that with append-only datastructures, but it's not right either to treat this like a negligible detail.

@garbados Your comment is mostly denial of the reality of @nazeh's. There is only local deletion as a certainty in all data in all systems. If you want to bikeshed whether this should be called something more subtle, like "purge" or "remove" or whatever, go for it. But there is no cleverness that can transcend physics.

The delete operation clearly indicates to any peer that the block is gone and they can gc it (with a .clear op - not yet in 10, but wip). No guarantees that they do so, similar to user's taking screenshots of deleted twitter threads.

@garbados Your comment is mostly denial of the reality of @nazeh's. There is only local deletion as a certainty in all data in all systems. If you want to bikeshed whether this should be called something more subtle, like "purge" or "remove" or whatever, go for it. But there is no cleverness that can transcend physics.

I really don't appreciate this, @BitcoinErrorLog . I've been working on and with P2P tech for ages and I've heard this song and dance countless times. It remains irresponsible to pretend this behavior is a negligible detail, or that the difficulty of the task is worth abandoning the effort. If you think it's impossible, get cleverer. It isn't. Or, accept that these systems are only suitable for apps that don't require deletions -- say, research datasets, and little else.

At the risk of spilling words on deaf ears, please understand that "true deletion" is a misdirection to avoid confronting what we expect from deletion within an app. In general, I expect that a system will erase the data from disk, and that compliant peers will do so as well. I can't make guarantees about non-compliant peers, as no system can. Even in centralized systems, once data hits someone's machine, all bets are off. As @mafintosh pointed out, you can just take a screenshot. But you can still delete a tweet -- a kind of "future-forward" deletion in which future attempts to retrieve the data fail -- and that's good enough for users writ large.

My concern here then is that if it is possible to achieve this behavior with compliant peers, Hyperbee should do so by default. If it doesn't, then .del() is a misnomer that should at the very least include a bold disclaimer that it does not actually attempt to make data from the shared datastructure irretrievable. I would rather Hyperbee include logic to gc the block as part of the .del() op, and that peers using Hyperbee which observe a .del() op should gc it as well. That would be good enough for me.

I agree we should have an option to do the clear with del, once clear lands in 10.

If that’s the default or not prob depends, let’s see.

@garbados appreciate the input! will ping back once that lands 😊

Closing and locking since this is a bit heated