[BUG] I am using 2 nodes of 4 GPUs each, (total 8GPUs), but num_machines is always set to 1.
jackswl opened this issue · 4 comments
Prerequisites
- I have read the documentation.
- I have checked other issues for similar problems.
Backend
Local
Interface Used
CLI
CLI Command
autotrain --config /home/xxx
UI Screenshots & Parameters
No response
Error Logs
I am trying to execute with 2 nodes of 4 GPUs each, via
#PBS -l select=2:ncpus=128:ngpus=4:mem=880GB
However, the accelerate launch is always showing num_machines=1 when executing the CLI command:
autotrain --config /home/xxx
@abhishekkrthakur any idea on how to work on this? Am I right to say that autotrain does not support multi-nodes? How to work around this?
Thanks!
Additional Information
No response
autotrain doesnt support multi-node, yet.
thanks for the reply.
BTW, do you happen to have a rough timeline of when autotrain will allow multi-node usage?
just asking so I can know roughly when to get back to this topic when multi-node is out!
thanks a lot.
This issue is stale because it has been open for 30 days with no activity.
This issue was closed because it has been inactive for 20 days since being marked as stale.