Range Scan?
prologic opened this issue · 9 comments
is there support for doing range scans?
For example:
start := []byte("foo_10")
end := []byte("foo_20")
trie.ForEachRange(start, end, func(node art.Node) bool {
// do something with node
return true
})
I think only prefix iteration is support currently? Would it be hard to add range iteration?
It is unclear for me from your example what you mean under "range". Could you define it?
You can try to use ForEachPrefix
:
key := []byte("foo_")
trie.ForEachPrefix(key, func(node art.Node) bool {
// validate here if node in your range...
return true
})
Are you saying I can implement a range scan using the ForEachPrefix
and testing that the node.Key()
falls within my start
and end
?
I think its better if I just put up code that shows what I'm trying to accomplish.
See: https://github.com/prologic/bitcask/pull/101
Basically we are ranging over keys between a start and end key.
Example (using the bitcask
shell):
$ git clone https://github.com/prologic/bitcask
Cloning into 'bitcask'...
remote: Enumerating objects: 69, done.
remote: Counting objects: 100% (69/69), done.
remote: Compressing objects: 100% (54/54), done.
remote: Total 790 (delta 26), reused 44 (delta 12), pack-reused 721
Receiving objects: 100% (790/790), 255.30 KiB | 453.00 KiB/s, done.
Resolving deltas: 100% (434/434), done.
$ cd bitcask
$ git checkout range_scan
Branch 'range_scan' set up to track remote branch 'range_scan' from 'origin'.
Switched to a new branch 'range_scan'
$ make
bitcask version v0.3.4@d03eb48
bitcaskd version v0.3.4@d03eb48
$ for i in $(seq 1 9); do ./bitcask -p ./tmp.db set foo_$i $i; done
$ ./bitcask -p ./tmp.db range foo_3 foo_7
3
4
5
6
7
Hope this helps makes things more clear
I see, you are using lexicographic comparison. See my idea below.
// tests here: https://play.golang.org/p/MsUL00KZqww
func longestCommonPrefix(start, end []byte) []byte {
var min, max []byte
var cmp = bytes.Compare(start, end) == -1
if cmp {
min, max = start, end
} else {
min, max = end, start
}
if len(min) == 0 {
return nil
}
var i int
for i = 0; i < len(min) && i < len(max) && min[i] == max[i]; i++ {
}
if i == 0 {
return nil
}
return min[:i]
}
func executor(key, start, end []byte, f func(key []byte) error) (bool, error) {
if bytes.Compare(key, start) < 0 || bytes.Compare(key, end) > 0 {
return false, nil // contnue iteration
}
if err := f(key); err != nil {
return false, err // stop iteration
}
return true, nil // continue iteration
}
// Range performs a range scan of keys matching a range of keys between the
// start key and end key and calling the function `f` with the keys found.
// If the function returns an error no further keys are processed and the
// first error returned.
func (b *Bitcask) Range(start, end []byte, f func(key []byte) error) (err error) {
lcp := longestCommonPrefix(start, end)
if lcp != nil {
// we have the longest common prefix, lets use `ForEachPrefix`
// to minimize number of iterations
b.trie.ForEachPrefix(lcp, func(node art.Node) (ok bool) {
ok, err = executor(node.Key(), start, end, f)
return (!ok && err == nil) || ok
})
} else {
// iterate over all nodes and check if node.key in the range
b.trie.ForEach(func(node art.Node) (ok bool) {
ok, err = executor(node.Key(), start, end, f)
return (!ok && err == nil) || ok
})
}
}
The above code was not tested. :)
Ahh I see. Compute the longest prefix of start
and end
and then use this as the starting point of the ForEachPRefix()
? Makes sense! Is this something we can build into your library :D
And yes I guess I meant Range Scan with lexicographic comparison as the keys are byte slices so I'm not sure any other kind of range scan makes sense? :)
Kleene–Brouwer order? :)