NVIDIA/waveglow

waveglow training error

mataym opened this issue · 3 comments

i trained the waveglow, but there is an error occurred as follows:
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 3.16e-322
243413: nan
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.6e-322
243414: nan
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 8e-323
243415: nan
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 4e-323
243416: nan
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2e-323
243417: nan
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1e-323
243418: nan
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 5e-324
243419: nan
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 0.0
243420: nan
Traceback (most recent call last):
File "train.py", line 188, in
train(num_gpus, args.rank, args.group_name, **train_config)
File "train.py", line 135, in train
scaled_loss.backward()
File "/home/speechlab/anaconda3/envs/waveglow/lib/python3.7/contextlib.py", line 119, in exit
next(self.gen)
File "/home/speechlab/anaconda3/envs/waveglow/lib/python3.7/site-packages/apex/amp/handle.py", line 123, in scale_loss
optimizer._post_amp_backward(loss_scaler)
File "/home/speechlab/anaconda3/envs/waveglow/lib/python3.7/site-packages/apex/amp/_process_optimizer.py", line 249, in post_backward_no_master_weights
post_backward_models_are_masters(scaler, params, stashed_grads)
File "/home/speechlab/anaconda3/envs/waveglow/lib/python3.7/site-packages/apex/amp/_process_optimizer.py", line 135, in post_backward_models_are_masters
scale_override=(grads_have_scale, stashed_have_scale, out_scale))
File "/home/speechlab/anaconda3/envs/waveglow/lib/python3.7/site-packages/apex/amp/scaler.py", line 183, in unscale_with_stashed
out_scale/grads_have_scale,
ZeroDivisionError: float division by zero

how can i do solve this problem? thanks a lot

Make sure that all your audio files are larger than sample_length.

Make sure that all your audio files are larger than sample_length.

hi @rafaelvalle thank u for ur reply! i checked every audio sample's length that larger than sample_length. but the problem still occurred. any other suggestions? thanks a lot!

Hi @mataym , have you solved your problem? I have the same problem as you!