Does integration.TorchDistributedTrial support multinode optimization?
siemdejong opened this issue · 0 comments
Does integration.TorchDistributedTrial
support multinode optimization?
I'm using Optuna on a SLURM cluster. Suppose I would like to do a distributed hyperparameter optimization using two nodes with two gpus each. Would submitting a script like pytorch_distributed_simple.py to multiple nodes yield expected results?
I assume every node would be responsible for executing their own trials (i.e. no nodes share trials) and every gpu on a node is responsible for its own portion of the data, determined by torch.utils.data.Dataloader
's sampler
. Is this assumption correct or are edits needed apart from TorchDistributedTrial
's requirement to pass None
to objective
calls on ranks other than 0.
I already tried the above, but I'm not sure how to check every node is responsible for distinct trials.