Refactor encodings
vweevers opened this issue · 6 comments
As written in #58, there are too many forms of encoding now. Yet for an abstract-leveldown
implementation there's no builtin primitive to decode/deserialize data.
So here's a WIP plan for refactoring. I'll be editing this.
- Add a
format
property to each encoding- This specifies what it encodes to. One of
utf8
,buffer
,view
,id - For
json
it will beutf8
, forutf8
it will beutf8
, etc
- This specifies what it encodes to. One of
- Rename the
binary
encoding tobuffer
, withbuffer
as an alias (in limited places, i.e. where we already supported this) - Add a new
view
encoding that works with typed arrays (including Buffer because that's a UInt8Array)- Only support Uint8Array for now, maybe DataView later
- Change the
utf8
encoding to always return a string, because return types of.encode()
must be predictable - Add encodings as a feature to
abstract-leveldown
- It will encode keys & values in the public API, and forward
encoding.format
to the private API - For example,
put('a', 123, { valueEncoding: 'json' })
could result in_put('a', '123', { valueEncoding: 'utf8' })
- Like
level
, the default encoding will beutf8
- It will encode keys & values in the public API, and forward
- An implementation must advertise its "native" encodings to
abstract-leveldown
via the manifest- For example,
memdown
will set{ encodings: { buffer: true } }
which says "I want buffers" - This could also be configurable! If you want to use typed arrays instead:
memdown({ storeEncoding: 'view' })
which will tellmemdown
to set{ encodings: { view: true } }
- andmemdown
won't have to do any translation itself
- For example,
- Implement "transcoder" encodings that translate from e.g.
utf8
tobuffer
.- Their
.type
will be e.g.utf8+buffer
,json+view
- Their
- These will be also advertised in the manifest, but by
abstract-leveldown
.- So on
memdown
you'll seedb.supports.encodings.json
even thoughmemdown
itself doesn't handlejson
- So on
- Remove
asBuffer
,valueAsBuffer
,keyAsBuffer
- Remove
_serializeKey
,_serializeValue
- The only remaining use case that I see for serialization is key prefixing
- When needed,
abstract-leveldown
will lookup a transcoder.- For example on
memdown
, adb.get(9)
will result in_get(<Buffer 39>)
by passing it trough the transcoderutf8+buffer
.
- For example on
-
In theory we can also support native encodings. If an implementation advertises e.g.encodings: { json: true }
, we can bypassabstract-leveldown
encoding. But if the encoding is idempotent (meaning.type === .format
) we won't, in order to still normalize user input. - Remove lesser-used encodings that can be moved to userland:
ascii
,ucs2
andutf16le
- This makes it easier to support typed arrays and to make
buffer
optional, becauseutf8
is supported by both Buffer and TextEncoder
- This makes it easier to support typed arrays and to make
- If an encoding is unsupported, throw an error with
{ code: LEVEL_ENCODING_NOT_SUPPORTED }
, rather than falling back to theid
(identity) encoding - Level/abstract-leveldown#361
Updated the above with a more complete plan, that removes the need for serialization and asBuffer
options, and also paves the way for typed arrays.
Example of a transcoder (some should be hardcoded to optimize them, others like json+buffer
can be dynamic):
exports['utf8+view'] = new Encoding({
encode (data) {
return ArrayBuffer.isView(data) ? data : textEncoder.encode(data)
},
decode (data) {
return textDecoder.decode(data)
},
type: 'utf8+view',
format: 'view'
})
Doing this in abstract-leveldown
could have performance benefits too. For example, if we know that a particular value encoding on a get()
operation is idempotent (like buffer
or utf8
) and we know that the implementation supports that encoding "natively", then we can do _get(..., callback)
without wrapping the callback in another function that decodes the value.
Ugh, this works better than I expected. Now subleveldown
can just forward operations to a db, without unwrapping or rewrapping that db with encoding-down
or levelup
. Doesn't matter what encoding options that db used, or whether it stores data as buffers or Uint8Array internally. On the subleveldown
db you can use any encoding too, including Uint8Array even though subleveldown
internally works with Buffers and strings. It just fucking works.
The following tests are passing (locally; haven't pushed yet):
Click to expand
const test = require('tape')
const suite = require('abstract-leveldown/test')
const memdown = require('memdown')
const subleveldown = require('subleveldown')
// Test abstract-leveldown compliance
function runSuite (factory) {
suite({ test, factory })
}
// Test basic prefix
runSuite(function factory (opts) {
return subleveldown(memdown(), 'test', opts)
})
// Test empty prefix
runSuite(function factory (opts) {
return subleveldown(memdown(), '', opts)
})
// Test custom separator
runSuite(function factory (opts) {
return subleveldown(memdown(), 'test', { ...opts, separator: '%' })
})
// Test on db with buffer encoding
runSuite(function factory (opts) {
return subleveldown(memdown({ keyEncoding: 'buffer' }), 'test', opts)
})
// Test on db with view encoding (Uint8Array)
runSuite(function factory (opts) {
return subleveldown(memdown({ keyEncoding: 'view' }), 'test', opts)
})
// Have memdown internally use views too
runSuite(function factory (opts) {
return subleveldown(memdown({ keyEncoding: 'view', storeEncoding: 'view' }), 'test', opts)
})
// Lastly, for good measure:
runSuite(function factory (opts) {
return subleveldown(memdown({ keyEncoding: 'buffer', storeEncoding: 'view' }), 'test', opts)
})
@ralphtheninja @juliangruber @MeirionHughes ARE YOU EXCITED? Because I am! Fuck!
Sounds like this gets us a lot of flexibility and api simplicity as well. Well done! 👏
Added support of other ecosystem encodings (codecs
, abstract-encoding
, multiformats
) to level-transcoder
, fixed some bugs, and updated its README: https://github.com/Level/transcoder. That part is now done.