bitcoindevkit/bdk

Introduce TLV for bdk_file_store

Opened this issue ยท 10 comments

Sub-task for #1103

Okay technically we don't need the T since we always append the same type. I propose the following:

Encoding

Encode each element as:

[Version: VarInt] [Length: VarInt] [Data: Bytes]

Where VarInt is defined here: https://en.bitcoin.it/wiki/Protocol_documentation#Variable_length_integer.

VersionedElement trait

pub trait VersionedElement: Sized {
    /// Decode the `data` of given `version`.
    fn decode_with_version(version: u64, data: &[u8]) -> Result<Self, VersionedDecodeError>;

    /// The version to encode the data with.
    fn encode_version() -> u64;
}

pub enum VersionedDecodeError {
    UnknownVersion(u64),
    Bincode(bincode::Error),
}

I did a lot of thinking about this.

Backwards compat

/// Persists an append-only list of changesets (`C`) to a single file.
#[derive(Debug)]
pub struct Store<C, V = C> {
    magic_len: usize,
    db_file: File,
    marker: PhantomData<(V, C)>,
}

impl<V, C> Store<V, C>
where
    C: Into<V> + Append,
    V: Into<C> + serde::Serialize + serde::de::DeserializeOwned,

As you can see there is a new type parameter V -- this is the type that is serialized to and from the actual disk.

So in practice you'd have:

pub struct MyChangeSetV0_17_0 {... }
pub struct MyChangeSet { ... }

// serde serde etc
pub enum VersionedMyChangeSet {
    V0_17_0(MyChangeSetV0_17_0)
    Current(MyChangeSet)
}

impl From<MyChangeSet> for VersionedMyChangeSet {
     fn from(value: VersionedMyChangeSet) -> MyChangeSet {
         // do the obvious
     }
}

Then you'd set V as VersionedMyChangeSet and C as MyChangeSet.

If we do have forwards compatibility we need a length prefix. This can be done by just encoding each entry into a Vec<u8> and bincode encoding the result to the file.

This makes backwards compat optional and straight forward to achieve. You just have to nudge users to create this enum. For wallet we can create VersionedChangeSet and do all this without depending on bdk_file_persist I'm sure it's likely to be useful in the future anyway.

When making new changesets for existing types (which for the record I actually think we can avoid) we keep the old one around of course and put ChangeSetV0_WHAT_EVER at the end. bdk_wallet will probably use this in turn to develop a new changeset when it goes to v2.

Forwards Compat

To do forwards compat we just write the length of each entry before we serialize it. This can be done in a couple of ways. e.g. https://docs.rs/bincode/latest/bincode/config/trait.Options.html#method.serialized_size or by just serialzing to a Vec<u8> and serializing that.

Now that we have the length, when we decode an entry to a Vec<u8>, we try and decode as V. If it fails we strip the enum varint (tell bincode to just read an int), and then just try to decode as C, if that fails we finally throw an error.

So this gives you forwards compat as long as whatever the future V is that has been written encodes first a varint and then something that C can partially decode. This would be the case if the new version has just added an extra optional field.

For what it's worth, I discussed with Evan about things we had as breaking changes coming up or speculated and none of them actually were compatible with forwards compat (none of them would be just adding an optional field).

I agree with @LLFourn's suggestion. I think someone should take this ticket on.

I later realized the "Forwards Compat" bit is not totally sound. Before trying to do a decode on a higher versioned entry you need to actually check it is higher versioned not just assume so because it failed. It could be a corrupted lower version entry or something else has gone wrong. This means we'd need to know the current version.

It's actually kinda possible to do this automatically with bincode if you require Default for V and just encode the Default and see what int it encodes as to get the implicit version number. Since we already enforce Default for Append, then V should do it too. A little hacky but it does give us the feature. I'd be tempted to not actually implement this just yet but leave forwards compat later. I fully realize that "let's do forwards compat later" sounds like the famous last words.

I agree with @LLFourn's suggestion. I think someone should take this ticket on.

I would like to give it a try.

@nymius I've assigned this to you, thanks for jumping in to take this on.

Can we push this TLV work for the bdk_file_store out of the 1.0.0-alpha milestone?

This would let the team focus on #1496 and #1498 changes and we'd only deliver backward compatibility for the beta with bdk_sqlite. The examples would also have to be changed to use bdk_sqlite but that should be faster/easier to review than redesigning bdk_file_store.

I discussed this with @evanlinjin and decided this issue can be moved to the 1.0.0-beta milestone. We just need to warn beta users that if they want backward data file compatibility they'll need to use the bdk_sql until this issue is done.

I'd like to know what @nymius and @evanlinjin actually think about whether it's a good idea to actually do this. Before I speculated about how to this but didn't really consider too deeply whether or not it's actually a good idea. We could just leave bdk_file_store as a low dependency ultra-minimal example of how to store and retrieve changesets. It's helpful in development before you commit to maintaining a specific backend for your users. Once we start making it forwards/backwards compatible we are implying that it can be used in production. I have no real fear about the design being used in production. I am concerned that bringing this to the level of quality needed to be confidently used in production might be wasted since I haven't heard of people wanting to use this? All the extra code also detracts from the its high-level pedagogical value since it's mostly about dealing with the particulars of bincode.

nymius commented

I didn't considered the legitimacy of this as a real production case either. It seemed a good case just for the comments on #1103.
Besides its utility I think I'm closer in the implementation to what you and evan underpinned in the comments, I only lacked the time to finish it last week.
From a pedagogical point of view, the code certainly gets more complex and probably hides what you are trying to "teach".
Looking at its production application, what are the alternatives? sqlite? sqlx? what benefits they bring in relation to bincode? Thinking about redundancy for example sqlite and file-store would have the same method: duplicate the file and copy it in another location.
One drawback I can think of is that Bincode is harder to recover if the file brakes, because we cannot be sure were the error occurred just from the file.
Something bincode doesn't have that SQLite does, is the ability to record snapshots, without having to implement any extra logic, for example. Something PostgreSQL is capable too.
Considering the number of operations database engines can perform that file-store would have to implement itself, I would consider it production unsuitable, as these operations usually improve the reliability of the application, and for file-store would have to be implemented from scratch with all the costs this involves.