huanngzh/EpiDiff

Stuck before training

Closed this issue · 2 comments

Hi.
Thanks for sharing nice work.

I tried to train the model with original dataset (using your example set) or with my custom dataset.

But I got stuck when the number of gpus is larger than 1 (DDP setting)

image

I found that training with single GPU has no problem. But only multi GPU makes this problem.

How to solve this problem?

Thanks in advance.

Hi, I haven't encountered a similar problem. Do you have more detailed output or instructions?

Hi. I forgot to close this issue.

I solved this issue ; No code error at repo.
huggingface/accelerate#2174
This was the key to solve.

I'll close the issue. Thank you for your answer.