LLNL/Aluminum

Thread safety

ndryden opened this issue · 2 comments

Aluminum currently does not provide a documented thread-safety guarantee. In practice, different backends implicitly provide different safety guarantees. I believe the main bottleneck is with the progress engine, which currently uses a single-producer/single-consumer queue.

We should:

  • Provide an explicit, documented standard for thread safety. I propose that, for performance reasons, we allow this to be chosen at compile-time to be either:
    • Multiple threads can simultaneously submit communication operations. If the operations are on the same communicator or compute stream, the ordering must be provided by the user.
    • Only a single thread submits communication at a time. (We can allow this to be different threads/etc.-- it is up to the user to synchronize.)
  • Implement the above. I believe the main task is to improve the queues the progress engine uses and ensure memory pools are thread-safe.

Things that are not necessarily thread-safe:

  • Internal CUDA stuff. (See, e.g., NCCLRequest).
  • Memory pools (depending on setting).
  • Progress engine.

Current testing indicates we are now thread-safe. 🎉

A number of PRs have contributed to this:

  • #124 (support for serializing all MPI calls to one thread)
  • #125 (thread-safety for CUDA resources)
  • #126 (new memory pools)
  • #127 (AL_THREAD_MULTIPLE)
  • #128 (testing with multiple threads)
  • Bugfixes: #130, #135

I will note for posterity that we were previously seeing rare errors with multiple threads and the host-transfer backend. I am not currently seeing those.

Remaining todo is to update the docs.

With #148 merged, I'm declaring this complete.

Any specific problems can be separate issues.