open-mpi/hwloc

no new types for module, cluster, tile, complex

Opened this issue · 1 comments

Intel CPUID defines a "DieGrp", "Die", "Tile", "Module" between "Package" and "Core" (DieGrp, Tile and Module aren't exposed to userspace in Linux). AMD rather defines "Die" and "Complex" (seems to be mapped to Tile in Linux, hence not exposed to userspace either). ARM also defines "Cluster", possibly multiple levels. Linux exposes a single cluster level, and uses L2 to implement it on x86, which means it's mostly useless outside of ARM. Windows is suppose to expose Die and Module but I couldn't test it (#480).

hwloc only has Die so far, everything else becomes a Group with a subtype strind indicating what it means. Die was obvious become supported by Linux and Windows and CPUID, pretty clear meaning. We could also add other types but:

  • Tile has never been used as far as I know. The term was used for KNL but CPUID didn't support it yet.
  • Module is used in real hardware but Linux doesn't seem to plan to detect/expose it
  • Cluster being identical to L2 makes it mostly useless on x86. Do we want an additional level on every x86 machine out there?
  • ARM Cluster and x86 module or even AMD Complex might be considered similar, do they need separate types? If a single type, which name do we use?

Keeping Groups for all these has the advantage of not hardwiring a single name, and not adding multiple new types. However Groups get merged by default, hence some will disappear. Not sure people would care.
In the case of Clusters, being merged by default is actually good because it prevents x86 L2-fake-clusters to appear.

I'd rather not introduce new types whose meaning will be unclear indeed.