jctian98/e2e_lfmmi

timeout when training

Closed this issue · 3 comments

Hi, when I want to training model with aishell1, I meet the problem that connect() timeout . Can you help me?

Hi, could you post some logs here so I can check the problem?

Sorry,I can't download the logs directly , The errors show as follow: " RuntimeError : connetct () timed out " ; in launch configs : rdzv_configs: { 'rank': 1, 'timeout': 900}

I'm a bit confused about the connect() function. Is that the k2.connect(), or some communication operation in DDP? I suppose we don't need a dictionary called rdzv_configs?