Need to gracefully handle failure from `solv_xfopen` in `load_yum_repo`
jlebon opened this issue · 6 comments
We've received an rpm-ostree crash dump showing a SIGSEGV going through libdnf:
(gdb) bt
#0 0x00007f133ed74d72 in __GI__IO_fread (buf=buf@entry=0x7f12d6ff9e80, size=size@entry=1, count=count@entry=8192, fp=fp@entry=0x0)
at iofread.c:37
#1 0x00007f133f880cca in fread (__stream=0x0, __n=8192, __size=1, __ptr=0x7f12d6ff9e80) at /usr/include/bits/stdio2.h:293
#2 solv_xmlparser_parse (xmlp=0x7f12d6ffc0d8, fp=0x0) at /usr/src/debug/libsolv-0.7.22-1.fc36.x86_64/ext/solv_xmlparser.c:354
#3 0x00007f133f8758a0 in repo_add_rpmmd (repo=0x7f122baf5780, fp=0x0, language=<optimized out>, flags=0)
at /usr/src/debug/libsolv-0.7.22-1.fc36.x86_64/ext/repo_rpmmd.c:1138
#4 0x000055fffb257a73 in load_yum_repo (error=0x7f12d6ffc840, hrepo=<optimized out>, sack=0x7f12d8798270)
at /usr/src/debug/rpm-ostree-2022.9-1.fc36.x86_64/libdnf/libdnf/dnf-sack.cpp:794
#5 dnf_sack_load_repo (error=0x7f12d6ffc840, flags=3, repo=0x7f12b8d9d980, sack=0x7f12d8798270)
at /usr/src/debug/rpm-ostree-2022.9-1.fc36.x86_64/libdnf/libdnf/dnf-sack.cpp:1846
#6 dnf_sack_add_repo (error=0x7f12d6ffc840, state=0x7f12d800caf0, flags=DNF_SACK_ADD_FLAG_FILELISTS, permissible_cache_age=4294967295,
repo=0x7f1243f4c300, sack=0x7f12d8798270) at /usr/src/debug/rpm-ostree-2022.9-1.fc36.x86_64/libdnf/libdnf/dnf-sack.cpp:2285
#7 dnf_sack_add_repos (error=0x7f12d6ffc840, state=0x7f12d800c930, flags=DNF_SACK_ADD_FLAG_FILELISTS,
permissible_cache_age=<optimized out>, repos=0x7f1233c45a60, sack=<optimized out>)
at /usr/src/debug/rpm-ostree-2022.9-1.fc36.x86_64/libdnf/libdnf/dnf-sack.cpp:2339
#8 dnf_context_setup_sack_with_flags (context=<optimized out>, state=0x7f12d800c930, flags=<optimized out>, error=0x7f12d6ffc840)
at /usr/src/debug/rpm-ostree-2022.9-1.fc36.x86_64/libdnf/libdnf/dnf-context.cpp:1827
#9 0x000055fffb217b6f in rpmostree_context_download_metadata (self=0x7f12cc0ef090, flags=DNF_CONTEXT_SETUP_SACK_FLAG_SKIP_RPMDB,
cancellable=0x7f1291482720, error=0x7f12d6ffc840) at src/libpriv/rpmostree-core.cxx:1073
#10 0x000055fffb20714a in refresh_md_transaction_execute (transaction=<optimized out>, cancellable=0x7f1291482720, error=0x7f12d6ffc840)
at src/daemon/rpmostreed-transaction-types.cxx:2274
#11 0x000055fffb2005f7 in transaction_execute_thread (task=0x7f12d80c20e0, source_object=<optimized out>, task_data=<optimized out>,
cancellable=0x7f1291482720) at src/daemon/rpmostreed-transaction.cxx:338
#12 0x00007f133fbe3b03 in initable_init (initable=<optimized out>, cancellable=<optimized out>, error=0x7f133fd0c430)
at ../gio/gsubprocess.c:335
The interesting bit here is notice how the fp
argument to repo_add_rpmmd
is 0 and I think that's what ends up causing the fault in glibc. Digging more with gdb, we're somehow led to a seemingly impossible situation. A screenshot from my gdb session I think is probably easier:
So somehow fp_primary
is NULL, yet there's clearly an assert(fp_primary)
there after it's assigned. So there's potentially some memory corruption going on. I'll let the reporter (@sgallagher) chime in on whether this is reproducible or if we can just blame it on a cosmic ray.
The block of code at the exact commit is here.
Edit: One possibility is that assert
is being compiled out, though I didn't find any clear evidence of that perusing koji builds for NDEBUG
.
Be aware that assert()
is optimized out when compiled with optimizations of -O1
or higher. So that assertion is completely ignored at runtime. A better option in the code here would be:
fp_primary = solv_xfopen(primary.c_str(), "r");
if (G_UNLIKELY (!fp_primary)) {
g_set_error(...);
retval = FALSE;
goto out;
}
Be aware that
assert()
is optimized out when compiled with optimizations of-O1
or higher.
Hmm I'm trying to find a source for this, but can't. At least assert(3)
says it's a no-op if NDEBUG
is defined, but also recommends against it.
OK yeah, according to https://stackoverflow.com/a/34314956/308136, cmake in release mode will set NDEBUG
. Hmm, this should be in the DWARF info too; will sanity-check.
Anyway, agreed regarding reworking the code!
https://kojipkgs.fedoraproject.org//packages/libdnf/0.67.0/2.fc36/data/logs/x86_64/build.log shows that libdnf was indeed built with -DNDEBUG
And of course, because we bundle libdnf in rpm-ostree, and the libdnf compilation is hidden behind a cargo build
, that didn't show up when I searched the rpm-ostree build log.
Yep, that code is a shining example of how not to do error handling (there's a lot of mishandled return codes). We've set this straight in dnf 5, but probably at least this case deserves better handling in dnf 4.