MiuLab/Taiwan-LLM

請問該如何解決 accelerate launch (multi-gpu) 下 torch.distributed.elastic.multiprocessing.errors.ChildFailedError 問題?

chyiin opened this issue · 0 comments

torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

axolotl.cli.train FAILED