alex-petrenko/sample-factory

CPU affinity issues on a big cluster

alex-petrenko opened this issue · 1 comments

@edbeeching I will leave this here so we don't forget.

Notes:

  • cpu_affinity() returns only half of the cores (40 instead of 80) which leads to workers only using 50% of the cores.
  • possibly related to the fact that this machine has 2 physical processors? Or maybe custom linux build?

Possible solutions:

  • ignore cpu_affinity() and use some other mechanism to retrieve the number of cores.

This issue was caused by slurm jobs being launched with --hint=nomultithread , switching this to --hint=multithread solves the problem.