openpmix/prrte

Valgrind Reports Invalid Memory Read in prun

samuelkgutierrez opened this issue · 3 comments

Thank you for taking the time to submit an issue!

Background information

What version of the PMIx Reference RTE (PRRTE) are you using? (e.g., v2.0, v3.0, git master @ hash, etc.)

4f652de

What version of PMIx are you using? (e.g., v4.2.0, git branch name and hash, etc.)

openpmix/openpmix@2165694

Please describe the system on which you are running

  • Operating system/version: Linux ubuntu-22 5.19.0-29-generic

Details of the problem

I'm not sure if I should report this problem on the OpenPMIx side or here, but I'll report it here. I noticed that Valgrind reports the following when testing on my laptop's VM:

==72815== Conditional jump or move depends on uninitialised value(s)
==72815==    at 0x4C0C76C: pmix_bfrops_base_pack_bool (bfrop_base_pack.c:108)
==72815==    by 0x4C0F88E: pmix_bfrops_base_pack_val (bfrop_base_pack.c:995)
==72815==    by 0x4C0DFB0: pmix_bfrops_base_pack_info (bfrop_base_pack.c:600)
==72815==    by 0x4C0C663: pmix_bfrops_base_pack_buffer (bfrop_base_pack.c:80)
==72815==    by 0x4C0C53C: pmix_bfrops_base_pack (bfrop_base_pack.c:62)
==72815==    by 0x4C36365: pmix41_pack (bfrop_pmix41.c:381)
==72815==    by 0x4AD962C: PMIx_Spawn_nb (pmix_client_spawn.c:333)
==72815==    by 0x4AD6D8A: PMIx_Spawn (pmix_client_spawn.c:104)
==72815==    by 0x48AAFB5: prun_common (prun_common.c:757)
==72815==    by 0x109B7B: prun (prun.c:202)
==72815==    by 0x10940C: main (main.c:13)
==72815==  Uninitialised value was created by a stack allocation
==72815==    at 0x48A85FA: prun_common (prun_common.c:316)
==72815== 
Client ns prte-ubuntu-22-33396@7 rank 0 pid 72819: Running on host ubuntu-22 localrank 0
Client ns prte-ubuntu-22-33396@7 rank 0: Finalizing
Client ns prte-ubuntu-22-33396@7 rank 0:PMIx_Finalize successfully completed
==72815== Invalid read of size 4
==72815==    at 0x4CFB01E: pmix_ptl_close (ptl_base_frame.c:246)
==72815==    by 0x4BCAEF7: pmix_mca_base_framework_close (pmix_mca_base_framework.c:213)
==72815==    by 0x4B66473: pmix_rte_finalize (pmix_finalize.c:81)
==72815==    by 0x4B7D4EE: PMIx_tool_finalize (pmix_tool.c:1549)
==72815==    by 0x48AC2E3: prun_common (prun_common.c:873)
==72815==    by 0x109B7B: prun (prun.c:202)
==72815==    by 0x10940C: main (main.c:13)
==72815==  Address 0x531315c is 172 bytes inside a block of size 1,736 free'd
==72815==    at 0x484727F: free (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==72815==    by 0x4B7C487: PMIx_tool_finalize (pmix_tool.c:1531)
==72815==    by 0x48AC2E3: prun_common (prun_common.c:873)
==72815==    by 0x109B7B: prun (prun.c:202)
==72815==    by 0x10940C: main (main.c:13)
==72815==  Block was alloc'd at
==72815==    at 0x4844899: malloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==72815==    by 0x4B6CAD9: pmix_tma_malloc (pmix_object.h:248)
==72815==    by 0x4B6CD88: pmix_obj_new_tma (pmix_object.h:693)
==72815==    by 0x4B6CB45: pmix_obj_new_debug_tma (pmix_object.h:426)
==72815==    by 0x4B71A94: PMIx_tool_init (pmix_tool.c:650)
==72815==    by 0x48A92C4: prun_common (prun_common.c:512)
==72815==    by 0x109B7B: prun (prun.c:202)
==72815==    by 0x10940C: main (main.c:13)

To reproduce, execute the following:

$ prte&
$ valgrind --trace-children=yes --leak-check=full --track-origins=yes prun -n 1 ./examples/hello

I've looked at it some, but more familiar eyes might expedite the search.

Here is some additional info from AddressSanitizer that might be useful:

=================================================================
==177436==ERROR: AddressSanitizer: heap-use-after-free on address 0x61c00000092c at pc 0x7efdc8df5a4b bp 0x7ffdd74c7bd0 sp 0x7ffdd74c7bc8
READ of size 4 at 0x61c00000092c thread T0

   #0 0x7efdc8df5a4a in pmix_ptl_close /home/samuel/devel/openpmix/src/mca/ptl/base/ptl_base_frame.c:246:48
   #1 0x7efdc8ab6913 in pmix_mca_base_framework_close /home/samuel/devel/openpmix/src/mca/base/pmix_mca_base_framework.c:213:19
   #2 0x7efdc898f6f9 in pmix_rte_finalize /home/samuel/devel/openpmix/src/runtime/pmix_finalize.c:81:12
   #3 0x7efdc89ce010 in PMIx_tool_finalize /home/samuel/devel/openpmix/src/tool/pmix_tool.c:1549:5
   #4 0x7efdc931a398 in prun_common /home/samuel/devel/prrte/src/prted/prun_common.c:873:11
   #5 0x55c42c141fbc in prun /home/samuel/devel/prrte/src/tools/prun/prun.c:202:10
   #6 0x55c42c140fb1 in main /home/samuel/devel/prrte/src/tools/prun/main.c:13:12
   #7 0x7efdc822350f in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16
   #8 0x7efdc82235c8 in __libc_start_main csu/../csu/libc-start.c:381:3
   #9 0x55c42c081404 in _start (/home/samuel/local/prrte/bin/prun+0x1e404) (BuildId: 48b89e8c1a73ff23da8b22f6c11fb0011733abda)

0x61c00000092c is located 172 bytes inside of 1736-byte region [0x61c000000880,0x61c000000f48)
freed by thread T0 here:
   #0 0x55c42c106b82 in free (/home/samuel/local/prrte/bin/prun+0xa3b82) (BuildId: 48b89e8c1a73ff23da8b22f6c11fb0011733abda)
   #1 0x7efdc89cbc4e in PMIx_tool_finalize /home/samuel/devel/openpmix/src/tool/pmix_tool.c:1531:13
   #2 0x7efdc931a398 in prun_common /home/samuel/devel/prrte/src/prted/prun_common.c:873:11
   #3 0x55c42c141fbc in prun /home/samuel/devel/prrte/src/tools/prun/prun.c:202:10
   #4 0x55c42c140fb1 in main /home/samuel/devel/prrte/src/tools/prun/main.c:13:12
   #5 0x7efdc822350f in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16

previously allocated by thread T0 here:
   #0 0x55c42c106e2e in __interceptor_malloc (/home/samuel/local/prrte/bin/prun+0xa3e2e) (BuildId: 48b89e8c1a73ff23da8b22f6c11fb0011733abda)
   #1 0x7efdc89d7d70 in pmix_tma_malloc /home/samuel/devel/openpmix/src/class/pmix_object.h:248:16
   #2 0x7efdc89d796d in pmix_obj_new_tma /home/samuel/devel/openpmix/src/class/pmix_object.h:693:32
   #3 0x7efdc89b3a03 in pmix_obj_new_debug_tma /home/samuel/devel/openpmix/src/class/pmix_object.h:426:29
   #4 0x7efdc89a1036 in PMIx_tool_init /home/samuel/devel/openpmix/src/tool/pmix_tool.c:650:36
   #5 0x7efdc9311e1b in prun_common /home/samuel/devel/prrte/src/prted/prun_common.c:512:32
   #6 0x55c42c141fbc in prun /home/samuel/devel/prrte/src/tools/prun/prun.c:202:10
   #7 0x55c42c140fb1 in main /home/samuel/devel/prrte/src/tools/prun/main.c:13:12
   #8 0x7efdc822350f in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16

It looks like the following resolves the ASAN issue:

diff --git a/src/tool/pmix_tool.c b/src/tool/pmix_tool.c                                                                                    
  index 0f867b5e..d063e44c 100644                                                                                                             
  --- a/src/tool/pmix_tool.c                                                                                                                  
  +++ b/src/tool/pmix_tool.c                                                                                                                  
  @@ -1512,7 +1512,7 @@ PMIX_EXPORT pmix_status_t PMIx_tool_finalize(void)                                                                    
       pmix_iof_static_dump_output(&pmix_client_globals.iof_stdout);                                                                          
       pmix_iof_static_dump_output(&pmix_client_globals.iof_stderr);                                                                          
                                                                                                                                              
  -    PMIX_RELEASE(pmix_client_globals.myserver);                                                                                            
  +    //PMIX_RELEASE(pmix_client_globals.myserver);                                                                                          
       PMIX_LIST_DESTRUCT(&pmix_client_globals.pending_requests);                                                                             
       for (n = 0; n < pmix_client_globals.peers.size; n++) {                                                                                 
           if (NULL      

** Edit** It looks likes maybe pmix_client_globals.myserver gets accessed at ptl_base_frame.c:246 after it is freed:

       if (0 <= pmix_client_globals.myserver->sd) { 

It looks like the Valgrind warning also goes away with that change. That said, this probably isn't the right modification, so I'll let @rhc54 decide how to proceed with this one.