Forking indexes for hyperlog
Built on the map/reduce pattern; hyperlog-index will call a map function on every hyperlog insert, building the index incrementally.
Using hyperlog-index, we can easily build a key/value store backed to a hyperlog that implements a multi-value register conflict strategy:
var level = require('level')
var indexer = require('hyperlog-index')
var hyperlog = require('hyperlog')
var sub = require('subleveldown')
var mkdirp = require('mkdirp')
var minimist = require('minimist')
var argv = minimist(process.argv.slice(2), {
default: { d: '/tmp/kv.db' }
})
mkdirp.sync(argv.d)
var hdb = level(argv.d + '/h')
var idb = level(argv.d + '/i')
var log = hyperlog(hdb, { valueEncoding: 'json' })
var db = sub(idb, 'x', { valueEncoding: 'json' })
var dex = indexer({
log: log,
db: sub(idb, 'i'),
map: function (row, next) {
// This method reduces our new state. In this example, db is used for the state.
db.get(row.value.k, function (err, doc) {
if (!doc) doc = {}
row.links.forEach(function (link) {
delete doc[link]
})
doc[row.key] = row.value.v
db.put(row.value.k, doc, next)
})
}
})
if (argv._[0] === 'get') {
dex.ready(function () {
db.get(argv._[1], function (err, values) {
if (err) console.error(err)
else console.log(values)
})
})
} else if (argv._[0] === 'put') {
// Structure `doc` as expected by `map` above
var doc = { k: argv._[1], v: argv._[2] }
dex.ready(function () {
db.get(doc.k, function (err, values) {
// Link the new entry to the "parents", from the current index, if any
log.add(Object.keys(values || {}), doc, function (err, node) {
if (err) console.error(err)
})
})
})
} else if (argv._[0] === 'sync') {
var r = log.replicate()
process.stdin.pipe(r).pipe(process.stdout)
r.on('end', function () { process.stdin.pause() })
}
Each key maps to an object of hashes to values:
$ node kv.js -d /tmp/db1 put A beep
$ node kv.js -d /tmp/db1 put A boop
$ node kv.js -d /tmp/db1 get A
{ '06e4130fc5f2392cb8bdb065d18eaa523d716f2c61b4877853340a5cc727fb42': 'boop' }
Meanwhile, a second database may have additional edits:
$ node kv.js -d /tmp/db2 put A whatever
$ node kv.js -d /tmp/db2 put B hey
When these two databases are merged together, the key at A
has two values:
$ dupsh 'node kv.js -d /tmp/db1 sync' 'node kv.js -d /tmp/db2 sync'
$ node kv.js -d /tmp/db1 get A
{ '06e4130fc5f2392cb8bdb065d18eaa523d716f2c61b4877853340a5cc727fb42': 'boop',
cba756b45e279ae5c3f3ebc8cfe0d50e1f2205e37a4443ce9e0e5a41491c234c: 'whatever' }
The B
key has only a single element:
$ node kv.js -d /tmp/db1 get B
{ '53a374617fb8839b6f19646d6658188a4fc08d19f35c084dab835847532a3468': 'hey' }
This is because put
does the linking of new nodes to old ones, which is not done in merge.
New updates that link at both existing keys will merge into a single key:
$ node kv.js -d /tmp/db1 put A whatboop
$ node kv.js -d /tmp/db1 get A
{ '85915730b3e7a4f715057e74af79b564a5be2ec14d334d344cb84d1544ec6107': 'whatboop' }
and these merges can be communicated over replication:
$ dupsh 'node kv.js -d /tmp/db1 sync' 'node kv.js -d /tmp/db2 sync'
$ node kv.js -d /tmp/db2 get A
{ '85915730b3e7a4f715057e74af79b564a5be2ec14d334d344cb84d1544ec6107': 'whatboop' }
And the index can be destroyed (and recalculated) at any time:
$ rm -rf /tmp/db1/i
$ node kv.js -d /tmp/db1 get A
{ '85915730b3e7a4f715057e74af79b564a5be2ec14d334d344cb84d1544ec6107': 'whatboop' }
This is a useful strategy when you need to update the code in your indexes.
Note: If you run the included example, the value is assumed to be a json object.
The command line put
format will be more like this:
$ node example/kv.js -d /tmp/db1 put A '{"baap":"boop"}'
Note: If you are primarily interested in a key/value index, like in this example - check out hyperkv
var indexer = require('hyperlog-index')
Create a new hyperlog index instance dex
from:
opts.log
- a hyperlog instance (required)opts.db
- a level instance (required)opts.map
- an indexing functionfunction (row, next) {}
You can have as many indexes as you like on the same log, just create more dex
instances on sublevels.
The indexing function fn
runs for each row
. The indexing function should
write its computed indexes to durable storage and call next(err)
when it is
finished.
Registers the callback fn()
to fire when the indexes have "caught up" to the
latest known change in the hyperlog. The fn()
function fires exactly once. You
may call dex.ready()
multiple times with different functions.
Pause calculating the indexes. dex.ready()
will not fire until the indexes
have been resumed.
Resume calculation of the indexes after dex.pause()
.
If the underlying system generates an error, you can catch it here.
With npm do:
npm install hyperlog-index
MIT