linux-nvme/libnvme

Double-free error

martin-belanger opened this issue · 5 comments

This to track the issue we previously discussed. Note that this may have already been fixed by #512.

One of our testers was running SLES15 SP4. He observed this double free error that was triggered by the nvme-cli auto-connect udev rule (70-nvmf-autoconnect.rules).

2022-10-13T09:35:44.949836-04:00 e2e-l4-180035 systemd[1]: Started NVMf auto-connect scan upon nvme discovery controller Events.
2022-10-13T09:35:44.953784-04:00 e2e-l4-180035 stafd[15032]: (tcp, 172.16.10.254, 8009, nqn.1988-11.com.dell:SFSS:1:20220607102307e8, vlan1610_25G) | nvme0 - Received AEN: 0x70f002
2022-10-13T09:35:44.972915-04:00 e2e-l4-180035 stafd[15032]: (tcp, 172.16.10.254, 8009, nqn.1988-11.com.dell:SFSS:1:20220607102307e8, vlan1610_25G) | nvme0 - Received discovery log pages (num records=47).
2022-10-13T09:35:45.204248-04:00 e2e-l4-180035 sh[27124]: failed to resolve host none info
2022-10-13T09:35:45.430830-04:00 e2e-l4-180035 sh[27124]: message repeated 46 times: [ failed to resolve host none info]
2022-10-13T09:35:45.434596-04:00 e2e-l4-180035 sh[27124]: free(): double free detected in tcache 2
2022-10-13T09:35:45.455031-04:00 e2e-l4-180035 systemd[1]: Started Process Core Dump (PID 27177/UID 0).
2022-10-13T09:35:45.637417-04:00 e2e-l4-180035 systemd-coredump[27178]: Process 27124 (nvme) of user 0 dumped core.

Found module linux-vdso.so.1 with build-id: 267a3e00bc5c234ec8c0c4c68ca478a2844e3338
Found module libpthread.so.0 with build-id: a0ff2a2c46cd120e7406351499c8ed73f7c11098
Found module libdl.so.2 with build-id: e76bf98a5f8b1af5ba349496c5e56e85b0701d84
Found module ld-linux-x86-64.so.2 with build-id: 731fe3b33b5e43b0a0630c430313ab2eb78f75db
Found module libcrypto.so.1.1 with build-id: cac3d636b7b29f085ea4ef6f7144ce89da7028ee
Found module libc.so.6 with build-id: 28910b266cdd8f0c54c7830b758e4a1339f255c1
Found module libhugetlbfs.so with build-id: d53677a8da49d286b7b4b5cf4a2e2de56b5b1668
Found module libz.so.1 with build-id: 334f2e2038235d5365fbed5e4f2dc9a59d9e5aba
Found module libjson-c.so.3 with build-id: d9e0dba9fb16b2ac9d395fefa60489821ff53cef
Found module libuuid.so.1 with build-id: 208d15ce0b482af1b78379c3990614531e4c9070
Found module libnvme.so.1 with build-id: f8c7bddd044314b38a66d40db1ed208cde8e9750
Found module nvme with build-id: 2a9943b3e85a4f3c8358cc3fc02bcc8ffdc9a252
Stack trace of thread 27124:
#0  0x00007f9e662cacdb raise (libc.so.6 + 0x4acdb)
#1  0x00007f9e662cc375 abort (libc.so.6 + 0x4c375)
#2  0x00007f9e66310b07 __libc_message (libc.so.6 + 0x90b07)
#3  0x00007f9e66318b8a malloc_printerr (libc.so.6 + 0x98b8a)
#4  0x00007f9e6631a9dd _int_free (libc.so.6 + 0x9a9dd)
#5  0x00007f9e66ed04a7 n/a (libnvme.so.1 + 0x104a7)
#6  0x00007f9e66ed051b n/a (libnvme.so.1 + 0x1051b)
#7  0x00007f9e66ed062b n/a (libnvme.so.1 + 0x1062b)
#8  0x00007f9e66ed06bb nvme_free_tree (libnvme.so.1 + 0x106bb)
#9  0x000055b321c0f20e n/a (nvme + 0xf20e)
#10 0x000055b321c3fac8 n/a (nvme + 0x3fac8)
#11 0x000055b321c0d3f6 n/a (nvme + 0xd3f6)
#12 0x00007f9e662b52bd __libc_start_main (libc.so.6 + 0x352bd)
#13 0x000055b321c0d57a n/a (nvme + 0xd57a)

2022-10-13T09:35:45.640411-04:00 e2e-l4-180035 systemd[1]: [mailto:nvmf-connect@--device\x3dnvme0\t--transport\x3dtcp\t--traddr\x3d172.16.10.254\t--trsvcid\x3d8009\t--host-traddr\x3dnone.service]nvmf-connect@--device\x3dnvme0\t--transport\x3dtcp\t--traddr\x3d172.16.10.254\t--trsvcid\x3d8009\t--host-traddr\x3dnone.service: Main process exited, code=dumped, status=6/ABRT
2022-10-13T09:35:45.640593-04:00 e2e-l4-180035 systemd[1]: [mailto:nvmf-connect@--device\x3dnvme0\t--transport\x3dtcp\t--traddr\x3d172.16.10.254\t--trsvcid\x3d8009\t--host-traddr\x3dnone.service]nvmf-connect@--device\x3dnvme0\t--transport\x3dtcp\t--traddr\x3d172.16.10.254\t--trsvcid\x3d8009\t--host-traddr\x3dnone.service: Failed with result 'core-dump'.
2022-10-13T09:35:45.642282-04:00 e2e-l4-180035 systemd[1]: systemd-coredump@3-27177-0.service: Deactivated successfully.
igaw commented

Is it still happening with the fix from #512?

We still need to test for it. Our testers mostly test with "vanilla" SLES15 SP4. I'm going to need to manually patch their systems. I will let you know what we find and if we can't reproduce it, I'll just close this ticket.

igaw commented

Obviously, someone at SUSE is able to provide you with a test rpm :)

Closing. Our testers could not reproduce the issue with libnvme 1.2.

igaw commented

Thanks for the feedback. We address this in downstream with the backport of the change from #512