Train the ViT in 4 2080Ti
wlsrick opened this issue · 1 comments
wlsrick commented
Hello,
I try to train the model with 4x2080Ti server, I use the command below,
bash tools/dist_train.sh ./configs/recognition/vit/vitclip_large_k400.py 4 --test-last --validate --cfg-options work_dir=./work_dirs
but it runs the error :
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
=========
tools/train.py FAILED
How can I solve it? Thanks a lot~
taoyang1122 commented
Hi, could you please post the complete log here?