Valgrind Reports Invalid Memory Read in prun
samuelkgutierrez opened this issue · 3 comments
Thank you for taking the time to submit an issue!
Background information
What version of the PMIx Reference RTE (PRRTE) are you using? (e.g., v2.0, v3.0, git master @ hash, etc.)
What version of PMIx are you using? (e.g., v4.2.0, git branch name and hash, etc.)
Please describe the system on which you are running
- Operating system/version: Linux ubuntu-22 5.19.0-29-generic
Details of the problem
I'm not sure if I should report this problem on the OpenPMIx side or here, but I'll report it here. I noticed that Valgrind reports the following when testing on my laptop's VM:
==72815== Conditional jump or move depends on uninitialised value(s)
==72815== at 0x4C0C76C: pmix_bfrops_base_pack_bool (bfrop_base_pack.c:108)
==72815== by 0x4C0F88E: pmix_bfrops_base_pack_val (bfrop_base_pack.c:995)
==72815== by 0x4C0DFB0: pmix_bfrops_base_pack_info (bfrop_base_pack.c:600)
==72815== by 0x4C0C663: pmix_bfrops_base_pack_buffer (bfrop_base_pack.c:80)
==72815== by 0x4C0C53C: pmix_bfrops_base_pack (bfrop_base_pack.c:62)
==72815== by 0x4C36365: pmix41_pack (bfrop_pmix41.c:381)
==72815== by 0x4AD962C: PMIx_Spawn_nb (pmix_client_spawn.c:333)
==72815== by 0x4AD6D8A: PMIx_Spawn (pmix_client_spawn.c:104)
==72815== by 0x48AAFB5: prun_common (prun_common.c:757)
==72815== by 0x109B7B: prun (prun.c:202)
==72815== by 0x10940C: main (main.c:13)
==72815== Uninitialised value was created by a stack allocation
==72815== at 0x48A85FA: prun_common (prun_common.c:316)
==72815==
Client ns prte-ubuntu-22-33396@7 rank 0 pid 72819: Running on host ubuntu-22 localrank 0
Client ns prte-ubuntu-22-33396@7 rank 0: Finalizing
Client ns prte-ubuntu-22-33396@7 rank 0:PMIx_Finalize successfully completed
==72815== Invalid read of size 4
==72815== at 0x4CFB01E: pmix_ptl_close (ptl_base_frame.c:246)
==72815== by 0x4BCAEF7: pmix_mca_base_framework_close (pmix_mca_base_framework.c:213)
==72815== by 0x4B66473: pmix_rte_finalize (pmix_finalize.c:81)
==72815== by 0x4B7D4EE: PMIx_tool_finalize (pmix_tool.c:1549)
==72815== by 0x48AC2E3: prun_common (prun_common.c:873)
==72815== by 0x109B7B: prun (prun.c:202)
==72815== by 0x10940C: main (main.c:13)
==72815== Address 0x531315c is 172 bytes inside a block of size 1,736 free'd
==72815== at 0x484727F: free (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==72815== by 0x4B7C487: PMIx_tool_finalize (pmix_tool.c:1531)
==72815== by 0x48AC2E3: prun_common (prun_common.c:873)
==72815== by 0x109B7B: prun (prun.c:202)
==72815== by 0x10940C: main (main.c:13)
==72815== Block was alloc'd at
==72815== at 0x4844899: malloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==72815== by 0x4B6CAD9: pmix_tma_malloc (pmix_object.h:248)
==72815== by 0x4B6CD88: pmix_obj_new_tma (pmix_object.h:693)
==72815== by 0x4B6CB45: pmix_obj_new_debug_tma (pmix_object.h:426)
==72815== by 0x4B71A94: PMIx_tool_init (pmix_tool.c:650)
==72815== by 0x48A92C4: prun_common (prun_common.c:512)
==72815== by 0x109B7B: prun (prun.c:202)
==72815== by 0x10940C: main (main.c:13)
To reproduce, execute the following:
$ prte&
$ valgrind --trace-children=yes --leak-check=full --track-origins=yes prun -n 1 ./examples/hello
I've looked at it some, but more familiar eyes might expedite the search.
Here is some additional info from AddressSanitizer that might be useful:
=================================================================
==177436==ERROR: AddressSanitizer: heap-use-after-free on address 0x61c00000092c at pc 0x7efdc8df5a4b bp 0x7ffdd74c7bd0 sp 0x7ffdd74c7bc8
READ of size 4 at 0x61c00000092c thread T0
#0 0x7efdc8df5a4a in pmix_ptl_close /home/samuel/devel/openpmix/src/mca/ptl/base/ptl_base_frame.c:246:48
#1 0x7efdc8ab6913 in pmix_mca_base_framework_close /home/samuel/devel/openpmix/src/mca/base/pmix_mca_base_framework.c:213:19
#2 0x7efdc898f6f9 in pmix_rte_finalize /home/samuel/devel/openpmix/src/runtime/pmix_finalize.c:81:12
#3 0x7efdc89ce010 in PMIx_tool_finalize /home/samuel/devel/openpmix/src/tool/pmix_tool.c:1549:5
#4 0x7efdc931a398 in prun_common /home/samuel/devel/prrte/src/prted/prun_common.c:873:11
#5 0x55c42c141fbc in prun /home/samuel/devel/prrte/src/tools/prun/prun.c:202:10
#6 0x55c42c140fb1 in main /home/samuel/devel/prrte/src/tools/prun/main.c:13:12
#7 0x7efdc822350f in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16
#8 0x7efdc82235c8 in __libc_start_main csu/../csu/libc-start.c:381:3
#9 0x55c42c081404 in _start (/home/samuel/local/prrte/bin/prun+0x1e404) (BuildId: 48b89e8c1a73ff23da8b22f6c11fb0011733abda)
0x61c00000092c is located 172 bytes inside of 1736-byte region [0x61c000000880,0x61c000000f48)
freed by thread T0 here:
#0 0x55c42c106b82 in free (/home/samuel/local/prrte/bin/prun+0xa3b82) (BuildId: 48b89e8c1a73ff23da8b22f6c11fb0011733abda)
#1 0x7efdc89cbc4e in PMIx_tool_finalize /home/samuel/devel/openpmix/src/tool/pmix_tool.c:1531:13
#2 0x7efdc931a398 in prun_common /home/samuel/devel/prrte/src/prted/prun_common.c:873:11
#3 0x55c42c141fbc in prun /home/samuel/devel/prrte/src/tools/prun/prun.c:202:10
#4 0x55c42c140fb1 in main /home/samuel/devel/prrte/src/tools/prun/main.c:13:12
#5 0x7efdc822350f in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16
previously allocated by thread T0 here:
#0 0x55c42c106e2e in __interceptor_malloc (/home/samuel/local/prrte/bin/prun+0xa3e2e) (BuildId: 48b89e8c1a73ff23da8b22f6c11fb0011733abda)
#1 0x7efdc89d7d70 in pmix_tma_malloc /home/samuel/devel/openpmix/src/class/pmix_object.h:248:16
#2 0x7efdc89d796d in pmix_obj_new_tma /home/samuel/devel/openpmix/src/class/pmix_object.h:693:32
#3 0x7efdc89b3a03 in pmix_obj_new_debug_tma /home/samuel/devel/openpmix/src/class/pmix_object.h:426:29
#4 0x7efdc89a1036 in PMIx_tool_init /home/samuel/devel/openpmix/src/tool/pmix_tool.c:650:36
#5 0x7efdc9311e1b in prun_common /home/samuel/devel/prrte/src/prted/prun_common.c:512:32
#6 0x55c42c141fbc in prun /home/samuel/devel/prrte/src/tools/prun/prun.c:202:10
#7 0x55c42c140fb1 in main /home/samuel/devel/prrte/src/tools/prun/main.c:13:12
#8 0x7efdc822350f in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16
It looks like the following resolves the ASAN issue:
diff --git a/src/tool/pmix_tool.c b/src/tool/pmix_tool.c
index 0f867b5e..d063e44c 100644
--- a/src/tool/pmix_tool.c
+++ b/src/tool/pmix_tool.c
@@ -1512,7 +1512,7 @@ PMIX_EXPORT pmix_status_t PMIx_tool_finalize(void)
pmix_iof_static_dump_output(&pmix_client_globals.iof_stdout);
pmix_iof_static_dump_output(&pmix_client_globals.iof_stderr);
- PMIX_RELEASE(pmix_client_globals.myserver);
+ //PMIX_RELEASE(pmix_client_globals.myserver);
PMIX_LIST_DESTRUCT(&pmix_client_globals.pending_requests);
for (n = 0; n < pmix_client_globals.peers.size; n++) {
if (NULL
** Edit** It looks likes maybe pmix_client_globals.myserver
gets accessed at ptl_base_frame.c:246 after it is freed:
if (0 <= pmix_client_globals.myserver->sd) {
It looks like the Valgrind warning also goes away with that change. That said, this probably isn't the right modification, so I'll let @rhc54 decide how to proceed with this one.