cannot run the detection training with mutiple gpu on a single node
lixi92 opened this issue · 3 comments
lixi92 commented
i tried to run the the code for detection training with mutiple gpu on a single node:
python tools/lazyconfig_train_net.py --num-gpus 2 \
--num-machines 1 --machine-rank 0 --dist-url "tcp://127.0.0.1:60900" \
--config-file projects/ViTDet/configs/fetus/cascade_mask_rcnn_vitdet_eva.py \
"train.init_checkpoint='eva_o365.pth'" \
"train.output_dir='output'"
projects/ViTDet/configs/fetus/cascade_mask_rcnn_vitdet_eva.py is my custom dataset config file
but i got nothing output
what was going wrong?
FrancoisPorcher commented
What was the issue?? It would help me
lixi92 commented
There was no problem with the official code, it was the code I added
This script worked with the official code
FrancoisPorcher commented
Okay great thank you. Did you manage to use it on 2 nodes at the same time? I tried to change --num-machines 1 by --num-machines 2 but it does not work. Maybe launching the two scripts at the same time on 2 different notes and settings the same dist-url for communication?
Also how did you launch the inference?