hwloc_internal_cpukinds_dup() does not correctly set topology->nr_cpukinds_allocated

Question

hwloc_internal_cpukinds_dup() does not correctly set topology->nr_cpukinds_allocated

HadrienG2 opened this issue 5 months ago · 1 comments

This bug was observed on hwloc v2.10.0, but if I'm reading the source code right, it is still present on master.

In hwloc, CPU kind information is recorded using a triplet of fields of the topology struct, which together act like a C++ std::vector:

topology->cpukinds points to the actual CPU kind descriptors
topology->nr_cpukinds indicates the size of the cpukinds array (number of valid kind descriptors that can be read)
topology->nr_cpukinds_allocated indicates the size of the underlying allocation

When a topology is duplicated using hwloc_topology_dup(), the work of duplicating CPU kind information is offloaded to hwloc_internal_cpukinds_dup(). This method correctly takes care of creating a new cpukinds allocation, filling it, and setting nr_cpukinds to its size, but unfortunately it does not take care of setting nr_cpukinds_allocated. On my machine, the field is just left zeroed out.

This will cause problem if hwloc_cpukinds_register() is later called on the duplicated topology, because inside of the hwloc_internal_cpukinds_register() utility function, the allocation will always be grown (since the code believes that it is zero-sized), and existing CPU kind structs will then be zeroed out by a memset that was supposed to only zero out newly allocated CPU kinds.

This will, among other things, cause the cpuset bitmap pointer of the existing CPU kinds to be set to NULL. And as a result, the subsequent cpuset comparisons will segfault.

I think hwloc_internal_cpukinds_dup() should set new->nr_cpukinds_allocated = old->nr_cpukinds instead, since that's the size of the kinds allocation that it is creating.

Answer 1 · 2024-08-21T19:21:53.000Z

Thanks. I indeed easily reproduced the segfault, and the fix works as well. I'll update regression tests and push things tomorrow.