Improve throughput of read streams by transferring multiple records at once
vweevers opened this issue ยท 12 comments
Working on the level-bench
benchmarks got me thinking. Currently level-iterator-stream
ignores the size
argument of stream._read(size)
. Per tick it transfers only 1 db record from the underlying iterator to the stream's buffer. I think we can be smarter about this. By connecting the knowledge that size
records are requested, all the way down to the db (in the case of leveldown
, down to the C++, potentially replacing its current read-ahead cache mechanism).
In pseudo-code it would look something like (ignore error handling for a moment):
// level-iterator-stream
ReadStream.prototype._read = function (size) {
var self = this
// Fetch <size> records from the db, then call "visit" repeatedly within a tick
this._iterator.visit(size, function visit (record) {
// Record is either null, an object { key, value } or just a key or value
self.push(record)
})
})
This also avoids allocating 3 callback functions per record. Alternatively:
this._iterator.nextv(size, function (records) { // aka "chunks" in streams
for (let record of records) self.push(record)
})
Or if streams were to get a .pushv
method similar to .writev
:
this._iterator.nextv(size, function (records) {
self.pushv(records)
})
/cc @mcollina: could such an API be faster? I'm also wondering how _read()
behaves in an asyncIterator. Is size
always 1 in that case, or does the stream read ahead?
Respecting size
is good for streams, but I guess moving from eager read-ahead cache to on demand read-ahead cache won't make much difference, because it cannot reduce the number of times we cross the C++ / JS boundary?
Getting more than one item at the same time will significantly increase throughput.
Is size always 1 in that case, or does the stream read ahead?
The streams read ahead accordingly to highWaterMark
. Essentially it would try to read 16 entries by default.
On-demand read head will guarantee some performance speedup because it reduces the time an object will live on the heap: as a result, the GC will end up doing less work. I would recommend implementing this anyway.
Error-handling included, I propose the following API:
iterator.nextv(size, function (err, records) {
if (err) // errored
if (records.length === 0) // reached end
for (let record of records) // ..
})
- If
size < 0
thennextv
yields all records (or as much aslimit
) (there is no default safeguard) - If
size === 0
thennextv
yields 0 records (it's illegal to callnextv(0)
more than once) nextv
respects the booleankeys
andvalues
options passed to the iterator ctor. If both are true,record
will be a{ key, value }
object. Elserecord
will be either a key or value.
We can play with this idea in leveldown
: introduce a nextv
method on the iterator, combine it with a temporary fork of level-iterator-stream
that utilizes nextv
, then benchmark it.
It might increase the number of times we cross the C++ / JS boundary, especially for small records, because the highWaterMark of streams in objectMode is measured in number of records, while the highWaterMark of leveldown's iterator is measured in bytes. IMO this does not matter because both these parameters can be tweaked by the user as necessary. Would it warrant semver-major though?
what changes to leveldown would this require? changing the c++ iterator cache size (per iterator) to match the nextv(size... ?
Depends on what we want to do with the highWaterMark
(in bytes) logic. I'm now wondering, in current code, whether db.createReadStream({ highWaterMark })
passes highWaterMark
to both the iterator and the stream. That would be a problem.
๐
We implement iterator.batchNext()
instead of iterator.nextv()
in a userland library. We use the undocumented iterator.cache
field.
When creating a leveldown iterator we set the high water mark to 1024 * 1024
aka 1MB to make sure the cache
is populated.
Implementing a native nextv()
in leveldown so that we don't access the private .cache
field would be nice.
I didn't even know you could specify it on the iterator options (its not on the docs). Is highWaterMark meant to be not documented? Also is this something that is (or can be) made uniform across AbstractLevelDown implementations?
@MeirionHughes It's missing in docs (Level/leveldown#468).
Also is this something that is (or can be) made uniform across AbstractLevelDown implementations?
Only leveldown
and rocksdb
can support the byte-based hwm.
How would y'all feel about altogether removing the hwm measured in bytes, in favor of a nextv()
measured in number of records? Every abstract-leveldown
implementation can support that.
๐ for nextv()
I think removing highWaterMark
in leveldown
itself is a breaking change in terms of default performance. You will want to benchmark the throughput and/or number of C <-> JS
boundaries in leveldown
with and without highWaterMark
to make sure there's no performance regression in removing highWaterMark
.
Alternatively phrased, if there's no highWaterMark
in leveldown we might want to implement the next
method in terms of calling nextv
and returning the first key/value pair and assigning the remainder into the existing iterator.cache
field.
Done in abstract-level
, memory-level
and classic-level
(not on npm yet), when combined with level-read-stream
.
When not using streams, you can still benefit from the new machinery by using nextv()
. And on classic-level
, next()
has the same performance characteristics as before (on leveldown
). As for hwm, there are 2 options now: Level/classic-level#1.