taoyang1122/adapt-image-models

Train the ViT in 4 2080Ti

wlsrick opened this issue · 1 comments

Hello,
I try to train the model with 4x2080Ti server, I use the command below,
bash tools/dist_train.sh ./configs/recognition/vit/vitclip_large_k400.py 4 --test-last --validate --cfg-options work_dir=./work_dirs
but it runs the error :

torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

=========
tools/train.py FAILED

How can I solve it? Thanks a lot~

Hi, could you please post the complete log here?