facebook/rocksdb

KeyMayExist dumps core when value pointer is null

mdcallag opened this issue · 3 comments

Note: Please use Issues only for bug reports. For questions, discussions, feature requests, etc. post to dev group: https://groups.google.com/forum/#!forum/rocksdb or https://www.facebook.com/groups/rocksdb.dev

Expected behavior

I expect no core dump when the value pointer is NULL.

I assume it should be safe to pass NULL for the value pointer in cases where I don't want to get the value. Otherwise, this method has no purpose because if I must fetch the value then I can use Get(). Also, the comments imply that I can pass NULL (see here) and the comment includes this text.

  // If the key definitely does not exist in the database, then this method
  // returns false, else true. If the caller wants to obtain value when the key
  // is found in memory, a bool for 'value_found' must be passed. 'value_found'

Actual behavior

I get a segfault when the value pointer is null with this command line: ./db_bench --benchmarks=fillseq,readrandomfast
After applying this diff:

diff --git a/tools/db_bench_tool.cc b/tools/db_bench_tool.cc
index 713aaaa41..4c00a1e04 100644
--- a/tools/db_bench_tool.cc
+++ b/tools/db_bench_tool.cc
@@ -6034,6 +6034,11 @@ class Benchmark {
           ts_ptr = &ts_ret;
         }
         auto status = db->Get(options, key, &value, ts_ptr);
+       {
+          bool b;
+          // auto status2 = db->KeyMayExist(options, key, &value, ts_ptr, &b);
+          auto status2 = db->KeyMayExist(options, key, NULL, ts_ptr, &b);
+       }
         if (status.ok()) {
           ++found;
         } else if (!status.IsNotFound()) {

And the stack trace

rocksdb::DBImpl::KeyMayExist (this=this@entry=0x7ffff735d000, read_options=..., column_family=0x7ffff723b760, key=..., value=value@entry=0x0, timestamp=timestamp@entry=0x0, value_found=0x7fffefff4e9f) at db/db_impl/db_impl.cc:3792
3792      value->assign(pinnable_val.data(), pinnable_val.size());
(gdb) where
#0  rocksdb::DBImpl::KeyMayExist (this=this@entry=0x7ffff735d000, read_options=..., column_family=0x7ffff723b760, key=..., value=value@entry=0x0, timestamp=timestamp@entry=0x0, value_found=0x7fffefff4e9f) at db/db_impl/db_impl.cc:3792
#1  0x00005555556792f0 in rocksdb::DB::KeyMayExist (value_found=0x7fffefff4e9f, timestamp=0x0, value=0x0, key=..., options=..., this=<optimized out>) at ./include/rocksdb/db.h:987
#2  rocksdb::Benchmark::ReadRandomFast (this=0x7fffffffd600, thread=0x7ffff738f800) at tools/db_bench_tool.cc:6040
#3  0x0000555555667a4a in rocksdb::Benchmark::ThreadBody (v=0x7ffff7346780) at tools/db_bench_tool.cc:3973
#4  0x00005555558bf8f9 in rocksdb::(anonymous namespace)::StartThreadWrapper (arg=0x7ffff7287130) at env/env_posix.cc:469
#5  0x00007ffff76c3ac3 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#6  0x00007ffff7755850 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

Steps to reproduce the behavior

hi @mdcallag ,

The issue arises from the sample code accessing the incorrect overload of the KeyMayExist method. There are two overloaded functions: one that includes a timestamp and one that does not.

// If the key definitely does not exist in the database, then this method
// returns false, else true. If the caller wants to obtain value when the key
// is found in memory, a bool for 'value_found' must be passed. 'value_found'
// will be true on return if value has been set properly.
// This check is potentially lighter-weight than invoking DB::Get(). One way
// to make this lighter weight is to avoid doing any IOs.
// Default implementation here returns true and sets 'value_found' to false

  1. withtimestamp:
    virtual bool KeyMayExist(const ReadOptions& /options/,
    ColumnFamilyHandle* /column_family/,
    const Slice& /key/, std::string* /value/,
    std::string* /timestamp/,
    bool* value_found = nullptr) {
    if (value_found != nullptr) {
    *value_found = false;
    }
    return true;
    }

  2. Without Timestamp:
    virtual bool KeyMayExist(const ReadOptions& options,
    ColumnFamilyHandle* column_family, const Slice& key,
    std::string* value, bool* value_found = nullptr) {
    return KeyMayExist(options, column_family, key, value,
    /timestamp=/nullptr, value_found);
    }

the second overload, which omits the timestamp, works as expected because it handles without requiring a valid pointer for the value. Therefore, to achieve the desired functionality, the method without the timestamp should be called.

Still get a core dump after I changed my testcase to use the method without a timestamp

Thread 37 "db_bench.dbg" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffff03f8640 (LWP 2422119)]
rocksdb::DBImpl::KeyMayExist (this=0x7ffff6f73000, read_options=..., column_family=0x7ffff6e3aac0, key=..., value=0x0, timestamp=0x0, value_found=0x7ffff03f5e9f) at db/db_impl/db_impl.cc:3792
3792      value->assign(pinnable_val.data(), pinnable_val.size());
(gdb) where
#0  rocksdb::DBImpl::KeyMayExist (this=0x7ffff6f73000, read_options=..., column_family=0x7ffff6e3aac0, key=..., value=0x0, timestamp=0x0, value_found=0x7ffff03f5e9f) at db/db_impl/db_impl.cc:3792
#1  0x000055555570846a in rocksdb::DB::KeyMayExist (this=this@entry=0x7ffff6f73000, options=..., column_family=<optimized out>, key=..., value=value@entry=0x0, value_found=value_found@entry=0x7ffff03f5e9f) at ./include/rocksdb/db.h:975
#2  0x000055555567ac9b in rocksdb::DB::KeyMayExist (value_found=0x7ffff03f5e9f, value=0x0, key=..., options=..., this=<optimized out>) at ./include/rocksdb/db.h:981
#3  rocksdb::Benchmark::ReadRandomFast (this=0x7fffffffd5b0, thread=0x7ffff6f5a800) at tools/db_bench_tool.cc:6042
#4  0x00005555556693da in rocksdb::Benchmark::ThreadBody (v=0x7ffff6f39780) at tools/db_bench_tool.cc:3973
#5  0x00005555558c2de9 in rocksdb::(anonymous namespace)::StartThreadWrapper (arg=0x7ffff6e78100) at env/env_posix.cc:469
#6  0x00007ffff7294ac3 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#7  0x00007ffff7326850 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

New diff is:

diff --git a/tools/db_bench_tool.cc b/tools/db_bench_tool.cc
index 713aaaa41..7998a1b2f 100644
--- a/tools/db_bench_tool.cc
+++ b/tools/db_bench_tool.cc
@@ -6034,6 +6034,13 @@ class Benchmark {
           ts_ptr = &ts_ret;
         }
         auto status = db->Get(options, key, &value, ts_ptr);
+       {
+          bool b;
+          // auto status2 = db->KeyMayExist(options, key, &value, ts_ptr, &b);
+          // auto status2 = db->KeyMayExist(options, key, NULL, ts_ptr, &b);
+          std::string* vptr = nullptr;
+          auto status2 = db->KeyMayExist(options, key, vptr, &b);
+       }
         if (status.ok()) {
           ++found;
         } else if (!status.IsNotFound()) {

In the context of RocksDB, I see the DB class serves as the public interface that users interact with, while the DBImpl class contains the actual implementation details. The assertion assert(value != nullptr); is a safety check designed to ensure that the pointer value is not nullptr before proceeding with further operations that assume it points to a valid memory location.