plar/go-adaptive-radix-tree

Range Scan?

prologic opened this issue · 9 comments

is there support for doing range scans?

For example:

start := []byte("foo_10")
end := []byte("foo_20")
trie.ForEachRange(start, end, func(node art.Node) bool {
    // do something with node
    return true
})

I think only prefix iteration is support currently? Would it be hard to add range iteration?

plar commented

It is unclear for me from your example what you mean under "range". Could you define it?

You can try to use ForEachPrefix:

key := []byte("foo_")
trie.ForEachPrefix(key, func(node art.Node) bool {
    // validate here if node in your range...
    return true
})

Are you saying I can implement a range scan using the ForEachPrefix and testing that the node.Key() falls within my start and end?

plar commented

I think its better if I just put up code that shows what I'm trying to accomplish.

See: https://github.com/prologic/bitcask/pull/101

Basically we are ranging over keys between a start and end key.

Example (using the bitcask shell):

$ git clone https://github.com/prologic/bitcask
Cloning into 'bitcask'...
remote: Enumerating objects: 69, done.
remote: Counting objects: 100% (69/69), done.
remote: Compressing objects: 100% (54/54), done.
remote: Total 790 (delta 26), reused 44 (delta 12), pack-reused 721
Receiving objects: 100% (790/790), 255.30 KiB | 453.00 KiB/s, done.
Resolving deltas: 100% (434/434), done.
$ cd bitcask
$ git checkout range_scan
Branch 'range_scan' set up to track remote branch 'range_scan' from 'origin'.
Switched to a new branch 'range_scan'
$ make
bitcask version v0.3.4@d03eb48
bitcaskd version v0.3.4@d03eb48
$ for i in $(seq 1 9); do ./bitcask -p ./tmp.db set foo_$i $i; done
$ ./bitcask -p ./tmp.db range foo_3 foo_7
3
4
5
6
7

Hope this helps makes things more clear

plar commented

I see, you are using lexicographic comparison. See my idea below.

// tests here: https://play.golang.org/p/MsUL00KZqww
func longestCommonPrefix(start, end []byte) []byte {
	var min, max []byte
	var cmp = bytes.Compare(start, end) == -1
	if cmp {
		min, max = start, end
	} else {
		min, max = end, start
	}

	if len(min) == 0 {
		return nil
	}

	var i int
	for i = 0; i < len(min) && i < len(max) && min[i] == max[i]; i++ {
	}
	if i == 0 {
		return nil
	}

	return min[:i]
}

func executor(key, start, end []byte, f func(key []byte) error) (bool, error) {
	if bytes.Compare(key, start) < 0 || bytes.Compare(key, end) > 0 {
		return false, nil // contnue iteration
	}

	if err := f(key); err != nil {
		return false, err // stop iteration
	}

	return true, nil // continue iteration
}

// Range performs a range scan of keys matching a range of keys between the
// start key and end key and calling the function `f` with the keys found.
// If the function returns an error no further keys are processed and the
// first error returned.
func (b *Bitcask) Range(start, end []byte, f func(key []byte) error) (err error) {
	lcp := longestCommonPrefix(start, end)
	if lcp != nil {
                // we have the longest common prefix, lets use `ForEachPrefix`
                // to minimize number of iterations
		b.trie.ForEachPrefix(lcp, func(node art.Node) (ok bool) {
			ok, err = executor(node.Key(), start, end, f)
			return (!ok && err == nil) || ok
		})
	} else {
                // iterate over all nodes and check if node.key in the range
	        b.trie.ForEach(func(node art.Node) (ok bool) {
			ok, err = executor(node.Key(), start, end, f)
			return (!ok && err == nil) || ok
		})
	}
}

The above code was not tested. :)

Ahh I see. Compute the longest prefix of start and end and then use this as the starting point of the ForEachPRefix()? Makes sense! Is this something we can build into your library :D

And yes I guess I meant Range Scan with lexicographic comparison as the keys are byte slices so I'm not sure any other kind of range scan makes sense? :)

plar commented

Kleene–Brouwer order? :)