when i run train.py with one gpu and dataset that sam_hq used, the process stopped and didn't move
Ryanye2000 opened this issue · 5 comments
Ryanye2000 commented
Ryanye2000 commented
/user75/sam-hq/train# python -m torch.distributed.launch --nproc_per_node=1 train.py --checkpoint ./pretrained_checkpoint/sam_vit_b_01ec64.pth --model-type vit_b --output work_dirs/hq_sam_b
I also tried this but still no response, it just stop in this line and no bug came out
Ryanye2000 commented
lkeab commented
hi can you instead run "python -m pdb demo/demo_hqsam.py" to see which code line is stuck?
Ryanye2000 commented
hi can you instead run "python -m pdb demo/demo_hqsam.py" to see which code line is stuck?
i handled it by switching ”import torch“ and "import os"
lkeab commented
this may indicate your cuda version and pytorch version mismatches