ncruces/go-sqlite3

Port the Checksum VFS

Closed this issue · 4 comments

This VFS raises a few very interesting issues.

This VFS needs to detect full page reads/writes, and so needs to be at the very top of the VFS stack: it needs to be the VFS that receives direct (or unmodified) calls to xRead and xWrite. Saying it can wrap others but can't be wrapped is a tough proposition.

Also, if a file with checksums is updated by any other means, it then looks corrupt to the VFS. So this feature really should be core to the library. Alas, it is not.

Still, that's why the loadable extension puts itself at the top of the stack, auto detecting which files have checksums.

My current view is to make this core to this wrapper, ensuring files with checksums update them by default, and providing knobs to disable checksums when so desired.

But there's more. The encryption VFSes also want to be at top of stack, though they don't strictly need to. We need to fix that by advancing the concept of a “wrapping” VFS. The challenge to solve is to help a rollback/WAL file “find” the corresponding database file of the same type (at the same level of wrapping). Basically that vfs.Filename.DatabaseFile() returns something useful in all cases (the Go VFS at the top of the stack, even if that's wrapped by something on the C side), and that from there we can get to the wanted file object.

The encryption VFSes now tolerate not being at top of stack:

if f, ok := vfsutil.UnwrapFile[*hbshFile](name.DatabaseFile()); ok {

I'm not sure I want to support VFSes implemented in C.

In general, it is a goal to allow C extensions to be built in (that gets us github.com/asg017/sqlite-vec, and FTS5, and R*Tree, GeoPoly, etc), but VFSes are pretty special 1, and I definitely assume the default VFS is the "os" VFS in a bunch of places, and that only Go VFSes are ever registered.

I guess I'll punt that complexity until someone asks for RBU or something like that.

Footnotes

  1. They're the only thing that's managed globally, not per-connection, and with late binding (the name identifies it). I have no idea what I'd do with a C VFS that only exists inside a single module instance. One “fix” I've considered is to eagerly bind VFSes when a connection is opened.

So if this could be done in Go, with the public API, this would be it.
But as said above, this probably needs to be at the top, in the default VFS.

I decided not to hold 3.47.0 on this any longer, and against coupling the driver too much with it (implementing it in the default VFS, mostly in C, etc).

I guess this article, however misguided, kinda changed my mind.
This VFS is still useful. It's about dealing with silent data corruption, not silent loss of durability.

The implementation in #176 is sound, though, and works for those who want it. It's not the default VFS, but you can still choose it. It needs to ensure it's the first VFS in the stack when used, but I think that can be done.