Support distributed training in AllenNLP integration
himkt opened this issue · 2 comments
himkt commented
Current AllenNLPExecutor
doesn't consider distributed training.
It would be useful if an executor supports distributed training.
Description
AllenNLP supports multi-device distributed training (https://medium.com/ai2-blog/c4d7c17eb6d6).
However, the current AllenNLPExecutor
doesn't consider the multiple GPUs situation in distributed optimization.
In distributed optimization, multiple processes may try to allocate memories on multiple GPUs and it could cause a memory error. It would be needed to schedule GPUs for each process.
himkt commented
Distributed training in AllenNLP spawns processes for training.
AllenNLPExecutor may fail to pass environment variables to them.
(himkt/allennlp-optuna#20)