Thread safety
ndryden opened this issue · 2 comments
ndryden commented
Aluminum currently does not provide a documented thread-safety guarantee. In practice, different backends implicitly provide different safety guarantees. I believe the main bottleneck is with the progress engine, which currently uses a single-producer/single-consumer queue.
We should:
- Provide an explicit, documented standard for thread safety. I propose that, for performance reasons, we allow this to be chosen at compile-time to be either:
- Multiple threads can simultaneously submit communication operations. If the operations are on the same communicator or compute stream, the ordering must be provided by the user.
- Only a single thread submits communication at a time. (We can allow this to be different threads/etc.-- it is up to the user to synchronize.)
- Implement the above. I believe the main task is to improve the queues the progress engine uses and ensure memory pools are thread-safe.
Things that are not necessarily thread-safe:
- Internal CUDA stuff. (See, e.g.,
NCCLRequest
). - Memory pools (depending on setting).
- Progress engine.
ndryden commented
Current testing indicates we are now thread-safe. 🎉
A number of PRs have contributed to this:
- #124 (support for serializing all MPI calls to one thread)
- #125 (thread-safety for CUDA resources)
- #126 (new memory pools)
- #127 (
AL_THREAD_MULTIPLE
) - #128 (testing with multiple threads)
- Bugfixes: #130, #135
I will note for posterity that we were previously seeing rare errors with multiple threads and the host-transfer backend. I am not currently seeing those.
Remaining todo is to update the docs.