rpm-software-management/libdnf

Need to gracefully handle failure from `solv_xfopen` in `load_yum_repo`

jlebon opened this issue · 6 comments

We've received an rpm-ostree crash dump showing a SIGSEGV going through libdnf:

(gdb) bt
#0  0x00007f133ed74d72 in __GI__IO_fread (buf=buf@entry=0x7f12d6ff9e80, size=size@entry=1, count=count@entry=8192, fp=fp@entry=0x0)
    at iofread.c:37
#1  0x00007f133f880cca in fread (__stream=0x0, __n=8192, __size=1, __ptr=0x7f12d6ff9e80) at /usr/include/bits/stdio2.h:293
#2  solv_xmlparser_parse (xmlp=0x7f12d6ffc0d8, fp=0x0) at /usr/src/debug/libsolv-0.7.22-1.fc36.x86_64/ext/solv_xmlparser.c:354
#3  0x00007f133f8758a0 in repo_add_rpmmd (repo=0x7f122baf5780, fp=0x0, language=<optimized out>, flags=0)
    at /usr/src/debug/libsolv-0.7.22-1.fc36.x86_64/ext/repo_rpmmd.c:1138
#4  0x000055fffb257a73 in load_yum_repo (error=0x7f12d6ffc840, hrepo=<optimized out>, sack=0x7f12d8798270)
    at /usr/src/debug/rpm-ostree-2022.9-1.fc36.x86_64/libdnf/libdnf/dnf-sack.cpp:794
#5  dnf_sack_load_repo (error=0x7f12d6ffc840, flags=3, repo=0x7f12b8d9d980, sack=0x7f12d8798270)
    at /usr/src/debug/rpm-ostree-2022.9-1.fc36.x86_64/libdnf/libdnf/dnf-sack.cpp:1846
#6  dnf_sack_add_repo (error=0x7f12d6ffc840, state=0x7f12d800caf0, flags=DNF_SACK_ADD_FLAG_FILELISTS, permissible_cache_age=4294967295,
    repo=0x7f1243f4c300, sack=0x7f12d8798270) at /usr/src/debug/rpm-ostree-2022.9-1.fc36.x86_64/libdnf/libdnf/dnf-sack.cpp:2285
#7  dnf_sack_add_repos (error=0x7f12d6ffc840, state=0x7f12d800c930, flags=DNF_SACK_ADD_FLAG_FILELISTS,
    permissible_cache_age=<optimized out>, repos=0x7f1233c45a60, sack=<optimized out>)
    at /usr/src/debug/rpm-ostree-2022.9-1.fc36.x86_64/libdnf/libdnf/dnf-sack.cpp:2339
#8  dnf_context_setup_sack_with_flags (context=<optimized out>, state=0x7f12d800c930, flags=<optimized out>, error=0x7f12d6ffc840)
    at /usr/src/debug/rpm-ostree-2022.9-1.fc36.x86_64/libdnf/libdnf/dnf-context.cpp:1827
#9  0x000055fffb217b6f in rpmostree_context_download_metadata (self=0x7f12cc0ef090, flags=DNF_CONTEXT_SETUP_SACK_FLAG_SKIP_RPMDB,
    cancellable=0x7f1291482720, error=0x7f12d6ffc840) at src/libpriv/rpmostree-core.cxx:1073
#10 0x000055fffb20714a in refresh_md_transaction_execute (transaction=<optimized out>, cancellable=0x7f1291482720, error=0x7f12d6ffc840)
    at src/daemon/rpmostreed-transaction-types.cxx:2274
#11 0x000055fffb2005f7 in transaction_execute_thread (task=0x7f12d80c20e0, source_object=<optimized out>, task_data=<optimized out>,
    cancellable=0x7f1291482720) at src/daemon/rpmostreed-transaction.cxx:338
#12 0x00007f133fbe3b03 in initable_init (initable=<optimized out>, cancellable=<optimized out>, error=0x7f133fd0c430)
    at ../gio/gsubprocess.c:335

The interesting bit here is notice how the fp argument to repo_add_rpmmd is 0 and I think that's what ends up causing the fault in glibc. Digging more with gdb, we're somehow led to a seemingly impossible situation. A screenshot from my gdb session I think is probably easier:

image

So somehow fp_primary is NULL, yet there's clearly an assert(fp_primary) there after it's assigned. So there's potentially some memory corruption going on. I'll let the reporter (@sgallagher) chime in on whether this is reproducible or if we can just blame it on a cosmic ray.

The block of code at the exact commit is here.

Edit: One possibility is that assert is being compiled out, though I didn't find any clear evidence of that perusing koji builds for NDEBUG.

Be aware that assert() is optimized out when compiled with optimizations of -O1 or higher. So that assertion is completely ignored at runtime. A better option in the code here would be:

fp_primary = solv_xfopen(primary.c_str(), "r");
if (G_UNLIKELY (!fp_primary)) {
  g_set_error(...);
  retval = FALSE;
  goto out;
}

Be aware that assert() is optimized out when compiled with optimizations of -O1 or higher.

Hmm I'm trying to find a source for this, but can't. At least assert(3) says it's a no-op if NDEBUG is defined, but also recommends against it.

OK yeah, according to https://stackoverflow.com/a/34314956/308136, cmake in release mode will set NDEBUG. Hmm, this should be in the DWARF info too; will sanity-check.

Anyway, agreed regarding reworking the code!

And of course, because we bundle libdnf in rpm-ostree, and the libdnf compilation is hidden behind a cargo build, that didn't show up when I searched the rpm-ostree build log.

Yep, that code is a shining example of how not to do error handling (there's a lot of mishandled return codes). We've set this straight in dnf 5, but probably at least this case deserves better handling in dnf 4.