open-mpi/hwloc

Convenience value path of hwloc_memattr_get_value() lacks target object type checking

HadrienG2 opened this issue · 2 comments

In hwloc 2.10.0 and current master, when hwloc_memattr_get_value() is called for a convenience value (CAPACITY or LOCALITY attribute), it forwards the call to the hwloc__memattr_get_convenience_value() utility function.

This function, in turns performs operations that either...

  • Assume the requested target is a NUMA node when querying the CAPACITY attribute.
  • Assume the requested target has a CPU set when querying the LOCALITY attribute.

However, these assumptions are not checked by either of hwloc_memattr_get_value() and hwloc__memattr_get_convenience_value(). As a result, a program which calls hwloc_memattr_get_value() on a target which is not a NUMA node can get arbitrary garbage (if querying the CAPACITY attribute) or a segfault (if querying the LOCALITY attribute on an object which doesn't have a cpuset).

I think at least one of the hwloc_memattr_get_value() and hwloc__memattr_get_convenience_value() functions should ensure that the input object is a NUMA node and fail with EINVAL otherwise. Since hwloc__memattr_get_convenience_value() returns an unsigned 64-bit integer, it has no available bits in its current return type to encode an error, so hwloc_memattr_get_value() would be the most obvious place to insert such an error check.

It's even worse than that. the given target is read without checking for NULL first. Thanks, I'll fix all this.
At some point, you'll have to explain how you find all those deep bugs :)

I'm just stress-testing my Rust bindings before release to make sure that I have covered as many weird input validation and error handling edge cases as I can, and occasionally the bug I find is on your side :) Not that often though, you've been quite thorough!

I use a mixture of testing approaches:

  • In the few cases where it's not too time-consuming, I sometimes like to use exhaustive tests that cover the full input space (or at least the full space of valid inputs). But that's almost never feasible, and even when it is feasible I often forget about this approach.
  • Most of the time, I use property-based tests where I test the functions on random inputs. If the input space is very large/infinite (as in the case of cpusets or topology objects) and the subspace of valid inputs is tiny, I tend to bias the input configuration RNG to increase the odds of generating a valid or near-valid input configuration.
  • In cases where generating random input is infeasible (e.g. building a topology from XML), I use simpler tests like round trip tests where I export the topology to XML and import it again.

...and then I measure coverage to check that I have covered most lines of code on my side (except for those edge cases that are simply not reproducible without a weird machine/OS :)).

All this runs on a Windows/macOS/Linux CI, together with a few static and dynamic analyzers, and lots of combinatorics because I make sure I support all hwloc 2.x releases as I claim to, the minimal version of the Rust compiler that I document to support, etc.