Merge levelup into abstract-leveldown
Closed this issue ยท 11 comments
This is just an idea at this point, briefly discussed offline; we're not committing to anything. The premise: make any abstract-leveldown
implementation a standalone, ready-to-go database.
Refactorings in the past years have increased modularity. It has allowed us to clean up various internals. That said, now that we have, there are downsides to the current architecture, with its many layers:
- Adding custom behavior to e.g.
level
means having to peel off layers first, then add them back (most notably insubleveldown
- that code is gnarly) - The layers share API surface with treacherously subtle differences (e.g.
leveldown
by itself returns Buffers by default,level(up)
returns strings by default). - There are 4 forms of "encoding" now:
encoding-down
, serialization,asBuffer
, and whatever the underlying storage does (e.g. IDB returning a Buffer as Uint8Array). Yet for anabstract-leveldown
implementation there's no builtin primitive to decode/deserialize data. - Breaking changes in one layer typically bubble up, leading to what we call "the release dance". Prior to that, canary testing is often required because we can't foresee everything.
- A rabbit hole of documentation and changelogs (
level
links tolevelup
links toencoding-down
links tolevel-codec
gives an example aboutleveldown
links tolevel
- you get the point) - Double allocation of callback functions, options objects, etc.
- No builtin mechanism for asynchronous hooks (this could be achieved as another
abstract-leveldown
layer, but that doesn't make it available to levelup per se) (related: #44). - Implementing manifests (#42) is hard.
Back to the idea: there are many open questions. Making a POC is probably the best way forward. To be clear: I don't want it to be a big batteries-included database. I think what we need are simpler primitives to enable composing (and maintaining!) such a database. Some "batteries" could be worth including, if they provide a foundation for userland modularity.
We have been recommending people to use levelup
, and more so, level
and friends, which is a clear indication that much of the functionality belongs in a single core component. Or to put it in other words, a single shell (the public API of abstract-leveldown
) around a nut (the private API of abstract-leveldown
).
I'm all for simplifying. There's also an issue of performance. Maybe we could benchmark the hell out of level
and compare with a benchmark done on a POC.
- What would this mean for current implementations of
abstract-leveldown
? - Are we aiming for a smooth transition for implementations?
- How would encodings fit in? (it must be easy to pull in e.g.
bytewise
andcharwise
).
Love the picture btw. This style is really nice. We should reuse it for documentation, blog post etc. How did you make it?
Love the picture btw. This style is really nice. We should reuse it for documentation, blog post etc. How did you make it?
Photoshop. I wanted to try its 3D features. And I'm never gonna do it again, lol. Took 10 crashes and retries and then 20 minutes waiting time just to render some damn plastic material ๐
Some illustrated thoughts about encodings (take with a grain of salt).
I was thinking about benefits of keeping encoding-down
as-is. It might be a more flexible building block in scenarios where you need to bypass encoding, or perform encoding early-on. For example, when you have a server and client that both read from a db, and data must be encoded prior to transport from client to server. In addition, the server has a separate sublevel. Along the lines of:
It gets more complicated when you introduce manifests. Let's say server-side you are indexing data (with some fictional module that is also an abstract-leveldown
implementation), and you want the client to be able to read from the index through a custom query()
method. That method is advertised through a manifest. And the first (key-like) argument of the method must be encoded.
So what would it look like if encodings were builtin in abstract-leveldown
? Taking a get
operation as an example. The multilevel(down)
client could say: it's on me to parse the value as JSON when I get it, and I can parse strings, so I'll pass down { valueEncoding: 'utf8' }
.
This could be optimized perhaps, by using the id
encoding downstream, avoiding String(String(..))
.
The indexing scenario would be similar (with a pseudocode argEncoding
option):
What bothers me though, is that encoding options can't describe what you have, they describe what you want (to happen). What if we split that, taking inspiration from HTTP's content negotiation?
In the indexing scenario, the argEncoding
option would describe which encoding was already applied to the argument, similar to HTTP's Content-Encoding
:
Are we aiming for a smooth transition for implementations?
I say yes. The first version can and should be a drop-in replacement. To keep all the history and get regression tests for free.
Rough roadmap
- Increase API parity between
abstract-leveldown
andlevelup
- Level/levelup#660
- Level/levelup#674
- Level/levelup#677
- Level/levelup#692
- Level/abstract-leveldown#364
- Level/levelup@4b35716, Level/levelup@cfce6bb
-
Haveabstract-leveldown
uselevel-errors
(semver-major if any error messages need to change) - Remove need for
db.isClosed()
andisOpen()
in tests (semver-patch) (relevant test) - Add _
nextTick
tolevelup
(see TODO comment insubleveldown
) - Add
.status
tolevelup
- Track open/closed state in
abstract-leveldown
, same aslevelup
-
Perform type checks in same order (e.g. check key before callback) (semver-minor or semver-major, TBD) - Add promise support to
abstract-leveldown
- Make
abstract-leveldown
anEventEmitter
- Emit the same events as
levelup
- Move
open()
options to constructor (semver-minor, by temporarily supporting options in both places)
- Fork
abstract-leveldown
toabstract-level
- Implement deferredOpen
-
Skipdeferred-leveldown
inlevelup
if manifest indicates that db supports deferredOpen - Add exemption for
deferred-leveldown
inmaybeError()
thatdb.status
may beopening
- Level/deferred-leveldown#90
-
- Add streams (I also want to consider removing streams, same as we did for write streams)
The fork provides a starting point for later breaking changes, and a turning point for dependents: when they make the switch from abstract-leveldown
to abstract-level
, you no longer need levelup
. Up until this point abstract-level
is a drop-in replacement for abstract-leveldown
that is also still safe to be wrapped by levelup
- yet behaves the same as levelup
if you don't. This is very optimistic, mind you. I'm sure there will be stumbling blocks along the way that may force us to reconsider this approach.
Let's make encodings "phase 2".
The "asBuffer problem"
level-ttl
suffers from this (Level/level-ttl#68 (comment)) and it makes it impossible to implement clear()
in a generic way (the code path there is different, but it's the same type of problem).
Comparing leveldown
and memdown
behavior; green is path taken.
In leveldown
the Buffer type is a "transport type" used to avoid copying data. It actually gets stored as a byte array. In memdown
a Buffer gets stored as-is.
We could partially solve it in memdown
by introducing another if-branch, but it'd break the id
encoding:
A proper long-term solution could be to 1) make memdown
encoding-aware and 2) add some attribute to codecs to describe the expected return type (because a boolean (as)Buffer
does not suffice).
Alternatively, we can change the utf8
codec to always return a string (currently its decode
function is an identity function). I.e. reintroduce Level/codec#12. That's like a band-aid to keep an old band-aid in place.
๐ก If abstract-leveldown
implementations are encoding-aware then leveldown
could implement the IndexedDB comparator (which sorts almost the same as bytewise/charwise
) natively.
Calling this abstract-level
going forward. It ain't down, it ain't up, it's level.
Let's make encodings "phase 2".
I think it will be easier to do this the right way from the get-go. I want to support encodings out of the box, replacing both _serialize()
(which is just another encoding API with a different name) and asBuffer
(which in a way expresses a relation between codecs, for example that json
is also utf8
). I'm working out the details.
This thread got messy with my frequent edits and inline task lists. Continuing at https://github.com/Level/abstract-level.