lxtGH/CAE

torch.distributed.elastic.multiprocessiong.erroes.ChildFailedError:

linglingl635 opened this issue · 1 comments

why my terminal tell me this problem after training epoch 0?
how can I fix it?
47O$WG{BIG$~(TH4LZECK_I

Hi, we haven't met this problem before and I guess it has nothing to do with the code.
Are the environment installed exactly the same as the readme file?