mratsim/laser

NUMA-aware memory allocation and computation

mratsim opened this issue · 0 comments

Most HPC system have more than 1 socket which poses quite a problem to many parallel libraries.

Even in OpenMP 4, distributing parallel compute to socket proc_bind(spread) and within sockets to actual core (so no hyperthreading before all core are used) was quite an ordeal:
DeepinScreenshot_select-area_20190713005519

OpenMP 5.0 brings Numa aware allocator, see https://techdecoded.intel.io/essentials/openmp-5-0-a-story-about-threads-and-tasks/ (35min in)
DeepinScreenshot_select-area_20190713005653