open-mpi/hwloc

Leaks in OpenCL after set_io_types_filter (reported by valgrind-3.21.0)

jkammerland opened this issue · 4 comments

What version of hwloc are you using?

2.9.3

Which operating system and hardware are you running on?

6.5.10-200.fc38.x86_64
Fedora

Details of the problem

  hwloc_topology_t topology;
  hwloc_topology_init(&topology);
  
  /* This line causes a memory leak */
  hwloc_topology_set_io_types_filter(topology, HWLOC_TYPE_FILTER_KEEP_IMPORTANT); 
  
  hwloc_topology_load(topology);

  // Cleanup
  hwloc_topology_destroy(topology);

Additional information

Built with with this configuration
./configure --prefix=/my/path --enable-static --disable-nvml --disable-cuda

So I use these deps, if it matters:
libudev
libpciaccess
opencl
libxml2

Ask me if you need help or if there's something I can do to help :)

I forgot the output example

==233768== HEAP SUMMARY:
==233768== in use at exit: 2,175,181 bytes in 15,995 blocks
==233768== total heap usage: 56,199 allocs, 40,204 frees, 1,904,732,056 bytes allocated
==233768==
==233768== LEAK SUMMARY:
==233768== definitely lost: 1,176 bytes in 21 blocks
==233768== indirectly lost: 2,988 bytes in 66 blocks
==233768== possibly lost: 2,120 bytes in 15 blocks
==233768== still reachable: 2,168,897 bytes in 15,893 blocks
==233768== suppressed: 0 bytes in 0 blocks
==233768== Rerun with --leak-check=full to see details of leaked memory

Hello. The only leaks I can reproduce are not in hwloc but in external library that allocate "static" things during their initialization and never free them in case they get reused later. set_io_types_filter() doesn't allocate anything itself. But it enables things that are disabled by default, especially in PCI/CUDA/OpenCL external libraries. This likely explain why those external leaks appear when to add set_io_types_filter().

If you want to further debug, you should set HWLOC_COMPONENTS_VERBOSE=1 in the environment. You'll get a line such this after 15 lines of debug:

hwloc: Final list of enabled discovery components: linux(0x7a),x86(0x2),no_os(0x2),pci(0x48),opencl(0x10),gl(0x10)

linux, x86 and no_os don't have external dependencies and are enabled by default. pci, opencl and gl are only used if set_io_types_filter(), so those would be candidates. The list of enabled components can be forced with HWLOC_COMPONENTS. For instance, to disable pci, opencl and gl above, set HWLOC_COMPONENTS=-pci,-opencl,-gl in the environment. Adapt all this to what you get in verbose messages.

Thanks for the reply :)
Indeed it is from opencl the memory leaks come from. When I disabled opencl for hwloc I see no leaks, but I cannot discover the gpu the same way I did before. I will look at it a bit more when I get some time.

Thanks for confirming. It's hard to know whether those OpenCL leaks are legitimate. A simple C program calling clGetPlatFormIds() already gets lots of leaks in valgrind. I assume they initialized lots of internal state variables during the first cl() function call. They never release them since there is no clInit()/clExit(). So those variables are allocated for ever. I am closing this issue, but feel free to discuss further here if needed.