incorrect behaviour of prefix extractor
Closed this issue · 5 comments
rocksdb version: 9.7.2
repro:
note: this is done using rust/rust-rocksdb crate but that shouldn't matter.
say my db has the following keys (values are not important for this)
123:example.com.my/john/hello/hi 456:789
and
123:example.com/blog/something 345:890
The key is stored as byte in the db (I call as_bytes()) before storing them
my prefix extractor:
fn custom_extractor(key: &[u8]) -> &[u8] {
// key format: <uid>:url <another uid>:<another uid>
let index = key
.iter()
.position(|&x| x == QUESTION_MARK || x == FORWARD_SLASH || x == SPACE);
index.map_or(key, |index| &key[..index])
}
which basically aims to extract the uid + host part of the url.
My prefix check function does the following:
let mut opts = ReadOptions::default();
opts.set_prefix_same_as_start(true);
let kv = DBIteratorWithThreadMode::<'a, Self>::new_cf(
self,
cf_handle.inner(),
opts,
IteratorMode::From(prefix.as_ref(), Direction::Forward),
).next()
kv <---- this is None, i.e no matching kv was found to return
in this case, when I call prefix_exists("123:example.com".as_bytes()), rocksdb doesn't return any kv. If i delete the 123:example.com.my/john/hello/hi 456:789
kv and then call the prefix_exists("123:example.com".as_bytes())
, I get back 123:example.com/blog/something 345:890
as expected.
Am I misunderstanding how prefix extractors work? While debugging this, I made sure my prefix extractor extracts 123:example.com
and 123:example.com.my
correctly from the full key. and given my prefix check is checking for 123:example.com
that should definitely match
Also when i enable total_order_seek
and bypass the bloom filter (technically using ribbon filter for this) while doing the prefix check, I get the expected key back so definitely something wrong with the prefix bloom?
another data point, checking if prefix: 123:example.com/
(notice the / at the end) works and returns 123:example.com/blog/something 345:890
as expected
it appears the main problem here is that the prefix lookup needs to use a prefix that is longer than the prefix used in the prefix bloom otherwise we hit this edge case hence why 123:example.com/
works since prefix in bloom is 123:example.com
and we are looking up a key that is longer. I don't know if this is expected/desired behaviour or a bug
@pdillinger I've noticed you working on improving prefix extractors lately in a1a102f , I'm wondering do you know if what I'm seeing here is expected? a bug? or some gotcha that is known?
This is actually a subtle, complex but reasonably well documented issue you've run into. See
rocksdb/include/rocksdb/options.h
Lines 255 to 265 in 26b4806
According to standard ASCII+bytewise key ordering, we have
(k1) 123:example.com
(k2) 123:example.com.my/john/hello/hi 456:789
(k3) 123:example.com/blog/something 345:890
But your prefix extractor assigns the same prefix to k1 and k3 and a different prefix to k2. Some of the description of new filtering features goes into why you want to make a trailing delimiter part of the key segment (or prefix).
rocksdb/include/rocksdb/experimental.h
Line 135 in 26b4806
Your options are basically to either
(a) consider the key "out of domain" of the prefix extractor if the delimiter is not there (and have to use a delimiter in your seek key),
(b) include the delimiter in the prefix (and similarly have to use the same delimiter in your seek key),
(c) change your comparator so that your delimiter bytes are ordered before any other bytes, or
(d) transform your key into and out of RocksDB so that your delimiters are transliterated to the smallest byte values.
got it, thank you!