facebook/rocksdb

ldb repair core dump

ltagliamonte-dd opened this issue · 2 comments

Hello Folks,
first time user of rocksdb.. I love it!
Would love some help with this issue i'm facing.

I'm using kvrocks project based on rocksdb 8.11.4, and unfortunately I'm dealing with a corrupted db:

ERR Corruption: block checksum mismatch: stored = 875772211, computed = 2997829094, type = 4  in /data/kvrocks/db/003571.sst offset 763295792 size 16306

I'm trying to repair the rocksdb with ldb util, and when I run /opt/rocksdb/ldb repair --db=/data/kvrocks/db/ i get:

terminate called after throwing an instance of 'std::bad_alloc'
  what():  std::bad_alloc
Aborted (core dumped)

I've tried to run ldb using gdb as well to get a stack trace:

gdb  --args /opt/rocksdb/ldb repair --db=/data/kvrocks/db/
GNU gdb (Debian 10.1-1.7) 10.1.90.20210103-git
Copyright (C) 2021 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /opt/rocksdb/ldb...
(gdb) run
Starting program: /opt/rocksdb/ldb repair --db=/data/kvrocks/db/
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7ffff6d62700 (LWP 1166)]
[New Thread 0x7ffff6361700 (LWP 1167)]
[New Thread 0x7ffff5960700 (LWP 1168)]
[New Thread 0x7ffff4f5f700 (LWP 1169)]
[New Thread 0x7fffeffff700 (LWP 1170)]
[New Thread 0x7fffef5fe700 (LWP 1171)]
[Thread 0x7fffeffff700 (LWP 1170) exited]
[Thread 0x7fffef5fe700 (LWP 1171) exited]
[New Thread 0x7fffeebfd700 (LWP 1172)]
[Thread 0x7fffeebfd700 (LWP 1172) exited]
[New Thread 0x7fffee1fc700 (LWP 1173)]
[New Thread 0x7fffed7fb700 (LWP 1174)]
[Thread 0x7fffee1fc700 (LWP 1173) exited]
[Thread 0x7fffed7fb700 (LWP 1174) exited]
[New Thread 0x7fffecdfa700 (LWP 1175)]
[Thread 0x7fffecdfa700 (LWP 1175) exited]
[New Thread 0x7fffec3f9700 (LWP 1176)]
[New Thread 0x7fffeb9f8700 (LWP 1177)]
[Thread 0x7fffec3f9700 (LWP 1176) exited]
[Thread 0x7fffeb9f8700 (LWP 1177) exited]
[New Thread 0x7fffeaff7700 (LWP 1178)]
[New Thread 0x7fffea5f6700 (LWP 1179)]
[Thread 0x7fffeaff7700 (LWP 1178) exited]
[Thread 0x7fffea5f6700 (LWP 1179) exited]
[New Thread 0x7fffe9bf5700 (LWP 1180)]
[Thread 0x7fffe9bf5700 (LWP 1180) exited]
[New Thread 0x7fffe91f4700 (LWP 1181)]
[New Thread 0x7fffe87f3700 (LWP 1182)]
[Thread 0x7fffe91f4700 (LWP 1181) exited]
[Thread 0x7fffe87f3700 (LWP 1182) exited]
[New Thread 0x7fffe7df2700 (LWP 1183)]
[New Thread 0x7fffe73f1700 (LWP 1184)]
[Thread 0x7fffe7df2700 (LWP 1183) exited]
[Thread 0x7fffe73f1700 (LWP 1184) exited]
terminate called after throwing an instance of 'std::bad_alloc'
  what():  std::bad_alloc

Thread 1 "ldb" received signal SIGABRT, Aborted.
__GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
50	../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x00007ffff706b537 in __GI_abort () at abort.c:79
#2  0x00007ffff72d17ec in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#3  0x00007ffff72dc966 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#4  0x00007ffff72dc9d1 in std::terminate() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#5  0x00007ffff72dcc65 in __cxa_throw () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#6  0x00007ffff72d142a in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#7  0x00007ffff7be589b in rocksdb::StderrLogger::Logv (this=0x5555555cb440, format=0x7fffffff82e0 "[WARN] [%s:845] Tail prefetch size %zu is calculated based on heuristics", ap=0x7fffffff8518)
    at util/stderr_logger.cc:47
#8  0x00007ffff79fb1fe in rocksdb::Logger::Logv (this=0x5555555cb440, log_level=<optimized out>, format=<optimized out>, ap=0x7fffffff8518) at env/env.cc:883
#9  0x00007ffff79fe2db in rocksdb::Logv (ap=0x7fffffff8518, format=0x7ffff7d8b6b8 "[%s:845] Tail prefetch size %zu is calculated based on heuristics", info_log=0x5555555cb440, log_level=rocksdb::WARN_LEVEL)
    at env/env.cc:901
#10 rocksdb::Logv (ap=0x7fffffff8518, format=0x7ffff7d8b6b8 "[%s:845] Tail prefetch size %zu is calculated based on heuristics", info_log=0x5555555cb440, log_level=rocksdb::WARN_LEVEL) at env/env.cc:895
#11 rocksdb::Log (log_level=log_level@entry=rocksdb::WARN_LEVEL, info_log=info_log@entry=0x5555555cb440, format=format@entry=0x7ffff7d8b6b8 "[%s:845] Tail prefetch size %zu is calculated based on heuristics")
    at env/env.cc:910
#12 0x00007ffff7b04bbc in rocksdb::BlockBasedTable::PrefetchTail (ro=..., file=0x5555555ce7f0, file_size=1126376725, force_direct_prefetch=<optimized out>, tail_prefetch_stats=<optimized out>, prefetch_all=true, 
    preload_all=true, prefetch_buffer=0x7fffffff8810, stats=0x0, tail_size=0, logger=0x5555555cb440) at ./logging/logging.h:24
#13 0x00007ffff7b0ba36 in rocksdb::BlockBasedTable::Open (read_options=..., ioptions=..., env_options=..., table_options=..., internal_comparator=..., file=..., file_size=1126376725, 
    block_protection_bytes_per_key=0 '\000', table_reader=0x7fffffff9058, tail_size=0, table_reader_cache_res_mgr=std::shared_ptr<rocksdb::CacheReservationManager> (empty) = {...}, 
    prefix_extractor=std::shared_ptr<const rocksdb::SliceTransform> (empty) = {...}, prefetch_index_and_filter_in_cache=true, skip_filters=false, level=-1, immortal_table=false, largest_seqno=0, 
    force_direct_prefetch=false, tail_prefetch_stats=0x5555555b7768, block_cache_tracer=0x0, max_file_size_for_l0_meta_pin=0, cur_db_session_id="A6CGBBKZY5HGGT8AIVIO", cur_file_num=2156, expected_unique_id=..., 
    user_defined_timestamps_persisted=true) at table/block_based/block_based_table_reader.cc:608
#14 0x00007ffff7af18e0 in rocksdb::BlockBasedTableFactory::NewTableReader (this=this@entry=0x5555555b7640, ro=..., table_reader_options=..., file=..., file_size=file_size@entry=1126376725, 
    table_reader=0x7fffffff9058, prefetch_index_and_filter_in_cache=true) at table/block_based/block_based_table_factory.cc:581
#15 0x00007ffff7963115 in rocksdb::TableCache::GetTableReader (this=0x5555555fc230, ro=..., file_options=..., internal_comparator=..., file_meta=..., sequential_mode=false, 
    block_protection_bytes_per_key=0 '\000', file_read_hist=0x0, table_reader=0x7fffffff9058, prefix_extractor=std::shared_ptr<const rocksdb::SliceTransform> (empty) = {...}, skip_filters=false, level=-1, 
    prefetch_index_and_filter_in_cache=true, max_file_size_for_l0_meta_pin=0, file_temperature=<optimized out>) at db/table_cache.cc:160
#16 0x00007ffff7964025 in rocksdb::TableCache::FindTable (this=0x5555555fc230, ro=..., file_options=..., internal_comparator=..., file_meta=..., handle=0x7fffffff9168, block_protection_bytes_per_key=0 '\000', 
    prefix_extractor=std::shared_ptr<const rocksdb::SliceTransform> (empty) = {...}, no_io=<optimized out>, file_read_hist=0x0, skip_filters=false, level=-1, prefetch_index_and_filter_in_cache=true, 
    max_file_size_for_l0_meta_pin=0, file_temperature=rocksdb::Temperature::kUnknown) at db/table_cache.cc:194
#17 0x00007ffff7965c53 in rocksdb::TableCache::GetTableProperties (this=0x5555555fc230, file_options=..., read_options=..., internal_comparator=..., file_meta=..., properties=0x7fffffff92c0, 
    block_protection_bytes_per_key=0 '\000', prefix_extractor=std::shared_ptr<const rocksdb::SliceTransform> (empty) = {...}, no_io=false) at db/table_cache.cc:610
#18 0x00007ffff79533e0 in rocksdb::(anonymous namespace)::Repairer::ScanTable (t=0x7fffffff9510, this=0x7fffffffc3f0) at db/repair.cc:538
#19 rocksdb::(anonymous namespace)::Repairer::ExtractMetaData (this=0x7fffffffc3f0) at db/repair.cc:508
#20 0x00007ffff7958cc3 in rocksdb::(anonymous namespace)::Repairer::Run (this=0x7fffffffc3f0) at db/repair.cc:223
#21 0x00007ffff795b536 in rocksdb::RepairDB (dbname="/data/kvrocks/db/", options=...) at db/repair.cc:856
#22 0x00007ffff7f7139e in rocksdb::RepairCommand::DoCommand (this=0x5555555bf4c0) at tools/ldb_cmd.cc:3623
#23 0x00007ffff7f910ff in rocksdb::LDBCommand::Run (this=this@entry=0x5555555bf4c0) at tools/ldb_cmd.cc:368
#24 0x00007ffff7fa75cb in rocksdb::LDBCommandRunner::RunCommand (argc=<optimized out>, argv=<optimized out>, options=..., ldb_options=..., column_families=<optimized out>) at tools/ldb_tool.cc:165
#25 0x00007ffff7fa877b in rocksdb::LDBTool::Run (this=this@entry=0x7fffffffe0ff, argc=argc@entry=3, argv=argv@entry=0x7fffffffe848, options=..., ldb_options=..., column_families=column_families@entry=0x0)
    at tools/ldb_tool.cc:178
#26 0x0000555555555132 in main (argc=3, argv=0x7fffffffe848) at tools/ldb.cc:11

I've checked already the ulimits and I have plenty of disk and memory available:

ulimit -a
real-time non-blocking time  (microseconds, -R) unlimited
core file size              (blocks, -c) unlimited
data seg size               (kbytes, -d) unlimited
scheduling priority                 (-e) 0
file size                   (blocks, -f) unlimited
pending signals                     (-i) 30446
max locked memory           (kbytes, -l) unlimited
max memory size             (kbytes, -m) unlimited
open files                          (-n) 131072
pipe size                (512 bytes, -p) 8
POSIX message queues         (bytes, -q) 819200
real-time priority                  (-r) 0
stack size                  (kbytes, -s) 10240
cpu time                   (seconds, -t) unlimited
max user processes                  (-u) unlimited
virtual memory              (kbytes, -v) unlimited
file locks                          (-x) unlimited
df -h /data/kvrocks/db/
Filesystem      Size  Used Avail Use% Mounted on
/dev/md127      885G   94G  792G  11% /data
free -h
               total        used        free      shared  buff/cache   available
Mem:           123Gi       3.4Gi        49Gi        58Mi        70Gi       119Gi
Swap:             0B          0B          0B

I've tried to repair the db on a EC2 instance with less available mem than the container and it worked with no issue.
In the container ldb is not working, haven't pin pointed yet what's the problem.
I've made a sysdig capture of the ldb process running in the container and it is trying to allocate approximately 6.25 exabytes.. there is definitively something going on executing ldb in container.

I'm building the project with:

ENV DEBIAN_FRONTEND=noninteractive
RUN apt update && apt install -y g++ libgflags-dev libsnappy-dev zlib1g-dev libbz2-dev liblz4-dev libzstd-dev git build-essential cmake libtool python3 libssl-dev && apt-get clean
ARG ROCKSDB_VERSION=v8.11.4
ARG MAKE_PARALLEL=6
ARG INSTALL_DIR=/tmp/rocksdb
RUN git clone --depth 1 -b ${ROCKSDB_VERSION} https://github.com/facebook/rocksdb.git ${INSTALL_DIR}
WORKDIR ${INSTALL_DIR}
ENV CXXFLAGS='-Wno-error=deprecated-copy -Wno-error=pessimizing-move -Wno-error=class-memaccess -frtti'
RUN make -j${MAKE_PARALLEL} all

So it turns out that using 8.11.4 was the real issue.
using 7.9.2 or latest 9.6.1 works, reference Dockerfile for others having the same issue:

FROM debian:bullseye as builder

# using latest because ldb on 8.11.4 doesn't compile right
# ldb repair from 9.6.1 has been tested and works with no problems.
ARG ROCKSDB_VERSION=9.6.1

ARG MAKE_PARALLEL=6
ARG BUILD_DIR="/opt/rocksdb"

RUN apt-get update \
   && apt-get install --no-install-recommends -y \
   dumb-init supervisor ca-certificates redis-tools libsnappy-dev libgflags-dev \
   libbz2-dev liblz4-dev libzstd-dev zlib1g-dev build-essential curl \
   && rm -rf /var/lib/apt/lists/*

RUN mkdir ${BUILD_DIR} \
   && curl -sSL https://github.com/facebook/rocksdb/archive/refs/tags/v${ROCKSDB_VERSION}.tar.gz \
   | tar zxC ${BUILD_DIR} --strip-component 1 \
   && cd ${BUILD_DIR} && DEBUG_LEVEL=0 LIB_MODE=shared PORTABLE=1 make -j${MAKE_PARALLEL} tools