About distributed training
Closed this issue · 3 comments
ucasyouzhao1987 commented
How to run your code in a distributed training? I try to set "use_distributed: True" in your configuration file, but I found it is not work. I found it only support one gpu mode.
Yu-Doit commented
The current version supports both single node multi GPUs mode and multi nodes multi GPUs mode, so just run the train.py
script with torchrun
. If you have encountered any problem, feel free to talk about it here!!
SoshyHayami commented
how about inference on multi-gpu?