when i run train.py with one gpu and dataset that sam_hq used, the process stopped and didn't move

Question

when i run train.py with one gpu and dataset that sam_hq used, the process stopped and didn't move

Ryanye2000 opened this issue a year ago · 5 comments

Ryanye2000 commented a year ago

Sorry to borther again

when i run train.py with one gpu and dataset that sam_hq used, the process stopped and didn't move

Answer 1 · 2023-11-27T03:28:20.000Z

/user75/sam-hq/train# python -m torch.distributed.launch --nproc_per_node=1 train.py --checkpoint ./pretrained_checkpoint/sam_vit_b_01ec64.pth --model-type vit_b --output work_dirs/hq_sam_b

I also tried this but still no response, it just stop in this line and no bug came out

Answer 2 · 2023-11-27T06:58:17.000Z

It's weired, i tried to run the demo and it still no response and no bug came out

Answer 3 · 2023-11-28T03:25:05.000Z

hi can you instead run "python -m pdb demo/demo_hqsam.py" to see which code line is stuck?

Answer 4 · 2023-11-28T06:48:39.000Z

hi can you instead run "python -m pdb demo/demo_hqsam.py" to see which code line is stuck?

i handled it by switching ”import torch“ and "import os"

Answer 5 · 2023-11-28T07:23:20.000Z

this may indicate your cuda version and pytorch version mismatches