Level/community

Merge levelup into abstract-leveldown

Closed this issue ยท 11 comments

This is just an idea at this point, briefly discussed offline; we're not committing to anything. The premise: make any abstract-leveldown implementation a standalone, ready-to-go database.

Refactorings in the past years have increased modularity. It has allowed us to clean up various internals. That said, now that we have, there are downsides to the current architecture, with its many layers:

image

  • Adding custom behavior to e.g. level means having to peel off layers first, then add them back (most notably in subleveldown - that code is gnarly)
  • The layers share API surface with treacherously subtle differences (e.g. leveldown by itself returns Buffers by default, level(up) returns strings by default).
  • There are 4 forms of "encoding" now: encoding-down, serialization, asBuffer, and whatever the underlying storage does (e.g. IDB returning a Buffer as Uint8Array). Yet for an abstract-leveldown implementation there's no builtin primitive to decode/deserialize data.
  • Breaking changes in one layer typically bubble up, leading to what we call "the release dance". Prior to that, canary testing is often required because we can't foresee everything.
  • A rabbit hole of documentation and changelogs (level links to levelup links to encoding-down links to level-codec gives an example about leveldown links to level - you get the point)
  • Double allocation of callback functions, options objects, etc.
  • No builtin mechanism for asynchronous hooks (this could be achieved as another abstract-leveldown layer, but that doesn't make it available to levelup per se) (related: #44).
  • Implementing manifests (#42) is hard.

Back to the idea: there are many open questions. Making a POC is probably the best way forward. To be clear: I don't want it to be a big batteries-included database. I think what we need are simpler primitives to enable composing (and maintaining!) such a database. Some "batteries" could be worth including, if they provide a foundation for userland modularity.

We have been recommending people to use levelup, and more so, level and friends, which is a clear indication that much of the functionality belongs in a single core component. Or to put it in other words, a single shell (the public API of abstract-leveldown) around a nut (the private API of abstract-leveldown).

I'm all for simplifying. There's also an issue of performance. Maybe we could benchmark the hell out of level and compare with a benchmark done on a POC.

  • What would this mean for current implementations of abstract-leveldown?
  • Are we aiming for a smooth transition for implementations?
  • How would encodings fit in? (it must be easy to pull in e.g. bytewise and charwise).

Love the picture btw. This style is really nice. We should reuse it for documentation, blog post etc. How did you make it?

Love the picture btw. This style is really nice. We should reuse it for documentation, blog post etc. How did you make it?

Photoshop. I wanted to try its 3D features. And I'm never gonna do it again, lol. Took 10 crashes and retries and then 20 minutes waiting time just to render some damn plastic material ๐Ÿ˜„

Some illustrated thoughts about encodings (take with a grain of salt).

I was thinking about benefits of keeping encoding-down as-is. It might be a more flexible building block in scenarios where you need to bypass encoding, or perform encoding early-on. For example, when you have a server and client that both read from a db, and data must be encoded prior to transport from client to server. In addition, the server has a separate sublevel. Along the lines of:

image

It gets more complicated when you introduce manifests. Let's say server-side you are indexing data (with some fictional module that is also an abstract-leveldown implementation), and you want the client to be able to read from the index through a custom query() method. That method is advertised through a manifest. And the first (key-like) argument of the method must be encoded.

image

So what would it look like if encodings were builtin in abstract-leveldown? Taking a get operation as an example. The multilevel(down) client could say: it's on me to parse the value as JSON when I get it, and I can parse strings, so I'll pass down { valueEncoding: 'utf8' }.

image

This could be optimized perhaps, by using the id encoding downstream, avoiding String(String(..)).

The indexing scenario would be similar (with a pseudocode argEncoding option):

image

What bothers me though, is that encoding options can't describe what you have, they describe what you want (to happen). What if we split that, taking inspiration from HTTP's content negotiation?

image

In the indexing scenario, the argEncoding option would describe which encoding was already applied to the argument, similar to HTTP's Content-Encoding:

image

Are we aiming for a smooth transition for implementations?

I say yes. The first version can and should be a drop-in replacement. To keep all the history and get regression tests for free.

Rough roadmap

  • Increase API parity between abstract-leveldown and levelup
  • Fork abstract-leveldown to abstract-level
  • Implement deferredOpen
    • Skip deferred-leveldown in levelup if manifest indicates that db supports deferredOpen
    • Add exemption for deferred-leveldown in maybeError() that db.status may be opening
    • Level/deferred-leveldown#90
  • Add streams (I also want to consider removing streams, same as we did for write streams)

The fork provides a starting point for later breaking changes, and a turning point for dependents: when they make the switch from abstract-leveldown to abstract-level, you no longer need levelup. Up until this point abstract-level is a drop-in replacement for abstract-leveldown that is also still safe to be wrapped by levelup - yet behaves the same as levelup if you don't. This is very optimistic, mind you. I'm sure there will be stumbling blocks along the way that may force us to reconsider this approach.

Let's make encodings "phase 2".

Flowchart for opening a db (based on latest levelup@4 and abstract-leveldown@6):

abstract-db - open transition

The "asBuffer problem"

level-ttl suffers from this (Level/level-ttl#68 (comment)) and it makes it impossible to implement clear() in a generic way (the code path there is different, but it's the same type of problem).

Comparing leveldown and memdown behavior; green is path taken.

abstract-db - asBuffer problem

In leveldown the Buffer type is a "transport type" used to avoid copying data. It actually gets stored as a byte array. In memdown a Buffer gets stored as-is.

We could partially solve it in memdown by introducing another if-branch, but it'd break the id encoding:

abstract-db - asBuffer problem (1)

A proper long-term solution could be to 1) make memdown encoding-aware and 2) add some attribute to codecs to describe the expected return type (because a boolean (as)Buffer does not suffice).

Alternatively, we can change the utf8 codec to always return a string (currently its decode function is an identity function). I.e. reintroduce Level/codec#12. That's like a band-aid to keep an old band-aid in place.

๐Ÿ’ก If abstract-leveldown implementations are encoding-aware then leveldown could implement the IndexedDB comparator (which sorts almost the same as bytewise/charwise) natively.

Calling this abstract-level going forward. It ain't down, it ain't up, it's level.

Let's make encodings "phase 2".

I think it will be easier to do this the right way from the get-go. I want to support encodings out of the box, replacing both _serialize() (which is just another encoding API with a different name) and asBuffer (which in a way expresses a relation between codecs, for example that json is also utf8). I'm working out the details.

This thread got messy with my frequent edits and inline task lists. Continuing at https://github.com/Level/abstract-level.