multiread

Question

multiread

dominictarr opened this issue 11 years ago · 11 comments

An operation that is like a batch, but for gets, but without the keys being adjacent in order.

I mention this because at campjs @rvagg and I discussed where the boundry between userland and core should be, regarding issues like: #9 (comment)

We couldn't think of many features that you might want to do in the C binding, but we decided that we should evaluate them on a case-by-case basis for a bit, until it becomes clearer.

So this issue is mainly intended for discussion, although it may be useful.

hmm, so for the purposes of discussion, the api is

db.getAll(arrayOfKeys, cb)

Would an operation like this also benefit from crossing the c-js boundry only once,
passing an array each way?

Answer 1 · 2013-02-24T10:47:59.000Z

Should benefit very much, I think @juliangruber had something similar in Leveled? We can do multi-threaded get()s at the moment so it's not all bad, but we have a limit to the number of threads.

It's interesting to think about this in relation to a range delete cause they could share a similar API.

db.getAll(array, cb)
db.delAll(array, cb)
db.delRange(options, cb) // where options is the same as for db.readStream()
db.getRange(options, cb) // where options is the same as for db.readStream()

Of course the getRange() on is a bit trickier cause of potential size problems, perhaps we should put an upper limit on it or at least enforce the use of a 'limit' property so at least they know what they are doing (they could put limit:99999 if they wanted but then at least it'd be explicit).

Answer 2 · 2013-02-24T11:11:28.000Z

I had db#range(from, to, cb) and db#find(glob, cb), no multiget though. For those methods the performance gain was huge because iterators require so many js-c boundry crossings.

With a multiget the gain would increase with the numbers of keys given. I'm not sure yet about when this would pay off. Someone would have to do an implementation so we can see.

Oh, and I'm for overloading the db#get function for this.

Answer 3 · 2013-02-24T22:51:25.000Z

getRange could just alias to iterator, with the limit set very large.
I'm opposed to adding safety features to leveldown, but am happy to have them in levelup.

see section about the "The Hole Hawg" in this article http://artlung.com/smorgasborg/C_R_Y_P_T_O_N_O_M_I_C_O_N.shtml

Answer 4 · 2014-05-12T02:57:51.000Z

ref #9

Marking as "help wanted" if someone wants to come up with an interesting solution for this that we may want to consider merging in. Again, unlikely to be something exposed at the LevelUP layer but might be nice to have down here.

Answer 5 · 2019-01-02T15:59:14.000Z

Hi there,

I've implemented this functionality, it looks like this:

db = leveldown (...), db.open, etc...

db.map (keys, [options, ], function (error, values) {
    // whatever you want to do here
});

options are the same ones used by .get (... options, ...).

keys is an array of strings / buffers.
values is an array of strings / buffers / undefineds, returned in same order of supplied keys. Keys not found are left undefined in resulting values array, keys found are returned as buffers or strings (depending on supplied options or defaults).

In my use case I'm actually checking for pre-existing non-consecutive keys before insertion. I did some benchmarks with 1 million keys and found some interesting results:

If the keys are all pre-existing, .map returns values three times faster than using .get
If the keys don't exist, .map returns values about ten times faster. This is usually the case for me so I'm going to fork leveldown, but anyway... I have a question to see if I can submit a pull request:

I followed other methods implementation, so had to edit both abstract-leveldown and leveldown. Added these methods:

abstract-leveldown -> AbstractLevelDOWN.prototype.map (parameter validation, options defaults, call to this._map)
leveldown -> LevelDOWN.prototype._map (call to binding.map_do)
binding.cc -> MapWorker, NAPI_METHOD(map_do)

My question is if I should pull-request the two separate github projects (abstract-leveldown & leveldown) or should I modify the implementation to keep all changes inside levedown?

abstract-leveldown could polyfill ._map implementation when not found, by issuing ._get by hand for every supplied key instead (however this ~~would be~~ is a lot slower) (benchmarked it too).

Don't hesitate to ask me any question, english is not my native language and I may have screwed the explanation a bit 😄

Answer 6 · 2019-01-02T18:09:48.000Z

@matonga nice work!

With iterator.seek(), we've made it possible for "multiread" aka "batch get" to be implemented in userland too. Or indeed as an abstract-leveldown fallback, though at the moment leveldown is the only implementation that supports seek(). See Level/levelup#491 (comment):

I think, rather than a batch get, I prefer to officially expose iterators with a seek() method. This not only enables fetching multiple non-contiguous keys but also getting multiple ranges of keys. You seek, read some, seek again. Also you get backpressure and can do things like skip scans (for compound aka composite indexes).

Would be interesting to compare that against your native solution. I'm guessing yours will be faster because in theory it only has to cross the C++/JS boundary once. Is your code on GitHub?

Answer 7 · 2019-01-02T18:25:44.000Z

My question is if I should pull-request the two separate github projects (abstract-leveldown & leveldown) or should I modify the implementation to keep all changes inside levedown?

It's fine for a new feature to live in leveldown for a while, for people to play with. The same happened with iterator.seek(). The barrier for new abstract-leveldown features is fairly high: there must be a good use case (that can't be solved with existing features), a solid API and at least a few other implementations that can support it (in theory).

Answer 8 · 2019-01-02T20:18:45.000Z

I'm guessing yours will be faster because in theory it only has to cross the C++/JS boundary once. Is your code on GitHub?

@vweevers that's right, I find it works faster, maybe because it crosses the C++/JS boundary once or maybe because there is less overhead (one Javascript call to map() vs. many calls to get() / seek()), or maybe it's a combination of both. I've just uploaded the code to GitHub:

matonga@b86b0a1

Any suggestions welcome. Guess I should write some test/map-test.js for a proper pull request?

The barrier for new abstract-leveldown features is fairly high: there must be a good use case (that can't be solved with existing features), a solid API and at least a few other implementations that can support it (in theory).

Just in case, I've moved all code to leveldown.

Answer 9 · 2019-01-02T21:21:00.000Z

matonga@b86b0a1

Any suggestions welcome. Guess I should write some test/map-test.js for a proper pull request?

Looks like a viable solution. We can discuss details in the PR :) It's up to you whether you want to write a test before opening a PR. It may help us to understand the intended behavior. It would be great if you can share your benchmarks as well (code and results). Thanks for taking this on!

Answer 10 · 2021-09-30T21:15:08.000Z

Level/community#101

Answer 11 · 2021-10-01T20:47:19.000Z

@vweevers thank you! good to see a multiread implementation being merged into leveldown and related repos at last.