Range Scan?

Question

Range Scan?

prologic opened this issue 5 years ago · 9 comments

is there support for doing range scans?

For example:

start := []byte("foo_10")
end := []byte("foo_20")
trie.ForEachRange(start, end, func(node art.Node) bool {
    // do something with node
    return true
})

I think only prefix iteration is support currently? Would it be hard to add range iteration?

Answer 1 · 2019-09-21T05:02:12.000Z

It is unclear for me from your example what you mean under "range". Could you define it?

You can try to use ForEachPrefix:

key := []byte("foo_")
trie.ForEachPrefix(key, func(node art.Node) bool {
    // validate here if node in your range...
    return true
})

Answer 2 · 2019-09-21T07:27:51.000Z

Are you saying I can implement a range scan using the ForEachPrefix and testing that the node.Key() falls within my start and end?

Answer 3 · 2019-09-21T09:04:15.000Z

Yes. How are you going to check if a node key falls in start, end range?

…

On Sat, Sep 21, 2019, 12:27 AM James Mills ***@***.***> wrote: Are you saying I can implement a range scan using the ForEachPrefix and testing that the node.Key() falls within my start and end? — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#11?email_source=notifications&email_token=AAARYI36NVM66T7LRRPYF4TQKXEHPA5CNFSM4IY4LRG2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD7IMMJQ#issuecomment-533775910>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAARYIY6A55B5LAQQZHOOELQKXEHPANCNFSM4IY4LRGQ> .

Answer 4 · 2019-09-21T11:56:23.000Z

I think its better if I just put up code that shows what I'm trying to accomplish.

See: https://github.com/prologic/bitcask/pull/101

Basically we are ranging over keys between a start and end key.

Answer 5 · 2019-09-21T12:02:28.000Z

Example (using the bitcask shell):

$ git clone https://github.com/prologic/bitcask
Cloning into 'bitcask'...
remote: Enumerating objects: 69, done.
remote: Counting objects: 100% (69/69), done.
remote: Compressing objects: 100% (54/54), done.
remote: Total 790 (delta 26), reused 44 (delta 12), pack-reused 721
Receiving objects: 100% (790/790), 255.30 KiB | 453.00 KiB/s, done.
Resolving deltas: 100% (434/434), done.
$ cd bitcask
$ git checkout range_scan
Branch 'range_scan' set up to track remote branch 'range_scan' from 'origin'.
Switched to a new branch 'range_scan'
$ make
bitcask version v0.3.4@d03eb48
bitcaskd version v0.3.4@d03eb48
$ for i in $(seq 1 9); do ./bitcask -p ./tmp.db set foo_$i $i; done
$ ./bitcask -p ./tmp.db range foo_3 foo_7
3
4
5
6
7

Hope this helps makes things more clear

Answer 6 · 2019-09-22T09:49:39.000Z

I see, you are using lexicographic comparison. See my idea below.

// tests here: https://play.golang.org/p/MsUL00KZqww
func longestCommonPrefix(start, end []byte) []byte {
	var min, max []byte
	var cmp = bytes.Compare(start, end) == -1
	if cmp {
		min, max = start, end
	} else {
		min, max = end, start
	}

	if len(min) == 0 {
		return nil
	}

	var i int
	for i = 0; i < len(min) && i < len(max) && min[i] == max[i]; i++ {
	}
	if i == 0 {
		return nil
	}

	return min[:i]
}

func executor(key, start, end []byte, f func(key []byte) error) (bool, error) {
	if bytes.Compare(key, start) < 0 || bytes.Compare(key, end) > 0 {
		return false, nil // contnue iteration
	}

	if err := f(key); err != nil {
		return false, err // stop iteration
	}

	return true, nil // continue iteration
}

// Range performs a range scan of keys matching a range of keys between the
// start key and end key and calling the function `f` with the keys found.
// If the function returns an error no further keys are processed and the
// first error returned.
func (b *Bitcask) Range(start, end []byte, f func(key []byte) error) (err error) {
	lcp := longestCommonPrefix(start, end)
	if lcp != nil {
                // we have the longest common prefix, lets use `ForEachPrefix`
                // to minimize number of iterations
		b.trie.ForEachPrefix(lcp, func(node art.Node) (ok bool) {
			ok, err = executor(node.Key(), start, end, f)
			return (!ok && err == nil) || ok
		})
	} else {
                // iterate over all nodes and check if node.key in the range
	        b.trie.ForEach(func(node art.Node) (ok bool) {
			ok, err = executor(node.Key(), start, end, f)
			return (!ok && err == nil) || ok
		})
	}
}

The above code was not tested. :)

Answer 7 · 2019-09-22T10:16:18.000Z

Ahh I see. Compute the longest prefix of start and end and then use this as the starting point of the ForEachPRefix()? Makes sense! Is this something we can build into your library :D

Answer 8 · 2019-09-22T10:17:22.000Z

And yes I guess I meant Range Scan with lexicographic comparison as the keys are byte slices so I'm not sure any other kind of range scan makes sense? :)

Answer 9 · 2019-09-28T07:13:34.000Z

Kleene–Brouwer order? :)