facebook/rocksdb

incorrect behaviour of prefix extractor

Closed this issue · 5 comments

rocksdb version: 9.7.2

repro:

note: this is done using rust/rust-rocksdb crate but that shouldn't matter.

say my db has the following keys (values are not important for this)

123:example.com.my/john/hello/hi 456:789

and

123:example.com/blog/something 345:890

The key is stored as byte in the db (I call as_bytes()) before storing them

my prefix extractor:

fn custom_extractor(key: &[u8]) -> &[u8] {
    // key format: <uid>:url <another uid>:<another uid>

    let index = key
        .iter()
        .position(|&x| x == QUESTION_MARK || x == FORWARD_SLASH || x == SPACE);

    index.map_or(key, |index| &key[..index])
}

which basically aims to extract the uid + host part of the url.

My prefix check function does the following:

        let mut opts = ReadOptions::default();
        opts.set_prefix_same_as_start(true);
       let kv = DBIteratorWithThreadMode::<'a, Self>::new_cf(
            self,
            cf_handle.inner(),
            opts,
            IteratorMode::From(prefix.as_ref(), Direction::Forward),
        ).next()
        
        kv <---- this is None, i.e no matching kv was found to return

in this case, when I call prefix_exists("123:example.com".as_bytes()), rocksdb doesn't return any kv. If i delete the 123:example.com.my/john/hello/hi 456:789 kv and then call the prefix_exists("123:example.com".as_bytes()), I get back 123:example.com/blog/something 345:890 as expected.

Am I misunderstanding how prefix extractors work? While debugging this, I made sure my prefix extractor extracts 123:example.com and 123:example.com.my correctly from the full key. and given my prefix check is checking for 123:example.com that should definitely match

Also when i enable total_order_seek and bypass the bloom filter (technically using ribbon filter for this) while doing the prefix check, I get the expected key back so definitely something wrong with the prefix bloom?

another data point, checking if prefix: 123:example.com/ (notice the / at the end) works and returns 123:example.com/blog/something 345:890 as expected

it appears the main problem here is that the prefix lookup needs to use a prefix that is longer than the prefix used in the prefix bloom otherwise we hit this edge case hence why 123:example.com/ works since prefix in bloom is 123:example.com and we are looking up a key that is longer. I don't know if this is expected/desired behaviour or a bug

@pdillinger I've noticed you working on improving prefix extractors lately in a1a102f , I'm wondering do you know if what I'm seeing here is expected? a bug? or some gotcha that is known?

This is actually a subtle, complex but reasonably well documented issue you've run into. See

// Together `prefix_extractor` and `comparator` must satisfy one essential
// property for valid prefix filtering of range queries:
// If Compare(k1, k2) <= 0 and Compare(k2, k3) <= 0 and
// InDomain(k1) and InDomain(k3) and prefix(k1) == prefix(k3),
// Then InDomain(k2) and prefix(k2) == prefix(k1)
//
// In other words, all keys with the same prefix must be in a contiguous
// group by comparator order, and cannot be interrupted by keys with no
// prefix ("out of domain"). (This makes it valid to conclude that no
// entries within some bounds are present if the upper and lower bounds
// have a common prefix and no entries with that same prefix are present.)

According to standard ASCII+bytewise key ordering, we have

(k1) 123:example.com
(k2) 123:example.com.my/john/hello/hi 456:789
(k3) 123:example.com/blog/something 345:890

But your prefix extractor assigns the same prefix to k1 and k3 and a different prefix to k2. Some of the description of new filtering features goes into why you want to make a trailing delimiter part of the key segment (or prefix).

// * Up to *and including* a delimiter byte or codeword.

Your options are basically to either
(a) consider the key "out of domain" of the prefix extractor if the delimiter is not there (and have to use a delimiter in your seek key),
(b) include the delimiter in the prefix (and similarly have to use the same delimiter in your seek key),
(c) change your comparator so that your delimiter bytes are ordered before any other bytes, or
(d) transform your key into and out of RocksDB so that your delimiters are transliterated to the smallest byte values.

got it, thank you!