NUMA-aware memory allocation and computation
mratsim opened this issue · 0 comments
mratsim commented
Most HPC system have more than 1 socket which poses quite a problem to many parallel libraries.
Even in OpenMP 4, distributing parallel compute to socket proc_bind(spread) and within sockets to actual core (so no hyperthreading before all core are used) was quite an ordeal:
OpenMP 5.0 brings Numa aware allocator, see https://techdecoded.intel.io/essentials/openmp-5-0-a-story-about-threads-and-tasks/ (35min in)