NVIDIA/waveglow

NaN or Inf found in input tensor

camjac251 opened this issue · 5 comments

I've been having this error happen around the 1000~ epoch mark where I'll start seeing WARNING:root:NaN or Inf found in input tensor. with every iteration run on Colab.

Is this safe to ignore? I tried looking it up and it seems to be related to tensorboard but I was worried it might be causing a model collapse or something with training.

Here's a shortened log of what it errors with. A full log is attached below

FP16 Run: False
cuDNN Enabled: True
cuDNN Benchmark: True
Loss function defined
Model defined
Optimizer defined
Loaded checkpoint '/content/drive/My Drive/colab/waveglow/outdir/waveglow_current_model' (iteration 108000)
Checkpoint loaded
Dataloader defined
output directory /content/drive/My Drive/colab/waveglow/outdir
/usr/local/lib/python3.6/dist-packages/ipykernel_launcher.py:175: TqdmDeprecationWarning: This function will be removed in tqdm==5.0.0
Please use `tqdm.notebook.tqdm` instead of `tqdm.tqdm_notebook`
Epoch:: 10%
971/10000 [2:43:23<617:21:12, 246.15s/it]
Starting Epoch: 931 Iteration: 108001
/content/waveglow/mel2samp.py:58: UserWarning: The given NumPy array is not writeable, and PyTorch does not support non-writeable tensors. This means you can write to the underlying (supposedly non-writeable) NumPy array using the tensor. You may want to copy the array to protect its data or make it writeable before converting it to a tensor. This type of warning will be suppressed for the rest of this program. (Triggered internally at  /pytorch/torch/csrc/utils/tensor_numpy.cpp:141.)
  return torch.from_numpy(data).float(), sampling_rate
/usr/local/lib/python3.6/dist-packages/ipykernel_launcher.py:178: TqdmDeprecationWarning: This function will be removed in tqdm==5.0.0
Please use `tqdm.notebook.tqdm` instead of `tqdm.tqdm_notebook`
07:51:35 108116: -4.826 -5.014 0.00010000LR 2.14s per iter: 100%
116/116 [24:30<00:00, 12.68s/it]
-------------------------------------------------------
Starting Epoch: 946 Iteration: 109741
/content/waveglow/mel2samp.py:58: UserWarning: The given NumPy array is not writeable, and PyTorch does not support non-writeable tensors. This means you can write to the underlying (supposedly non-writeable) NumPy array using the tensor. You may want to copy the array to protect its data or make it writeable before converting it to a tensor. This type of warning will be suppressed for the rest of this program. (Triggered internally at  /pytorch/torch/csrc/utils/tensor_numpy.cpp:141.)
  return torch.from_numpy(data).float(), sampling_rate
08:52:35 109856: nan nan 0.00010000LR 2.10s per iter: 100%
116/116 [09:08<00:00, 4.73s/it]
WARNING:root:NaN or Inf found in input tensor.
WARNING:root:NaN or Inf found in input tensor.
WARNING:root:NaN or Inf found in input tensor.
WARNING:root:NaN or Inf found in input tensor.
-------------------------------------------------------------------------
WARNING:root:NaN or Inf found in input tensor.


Starting Epoch: 947 Iteration: 109857
/content/waveglow/mel2samp.py:58: UserWarning: The given NumPy array is not writeable, and PyTorch does not support non-writeable tensors. This means you can write to the underlying (supposedly non-writeable) NumPy array using the tensor. You may want to copy the array to protect its data or make it writeable before converting it to a tensor. This type of warning will be suppressed for the rest of this program. (Triggered internally at  /pytorch/torch/csrc/utils/tensor_numpy.cpp:141.)
  return torch.from_numpy(data).float(), sampling_rate
08:56:39 109972: nan nan 0.00010000LR 2.10s per iter: 100%
116/116 [05:04<00:00, 2.62s/it]
WARNING:root:NaN or Inf found in input tensor.
WARNING:root:NaN or Inf found in input tensor.

colab-full-error-log.txt

Make sure your audio samples are larger than sample_length

Just to be 100% clear, do you mean segment_length?

Make sure your audio samples are larger than sample_length

hi @rafaelvalle , can u tell me what exactly sample_length mean is?
i wirted a func that get the parameters of wav file as follows?
def getInfoWavFile(wfile):
f = wave.open(wfile)
params = f.getparams()
Channels = f.getnchannels()
SampleRate = f.getframerate()
bit_type = f.getsampwidth() * 8
frames = f.getnframes()
Duration = wav_time = frames / float(SampleRate)
return params, Channels, SampleRate, bit_type, frames, Duration
in this function which parameter should be >= sample_length(in my config sample_length=16000)?thanks

hey @mataym this is how I got the segment length information for my dataset

import wave
import contextlib
import os

min_length = 9999999

for file in os.listdir('data'):
    with contextlib.closing(wave.open(os.path.join('data', file),'r')) as f: 
        frames = f.getnframes()
        #rate = f.getframerate()
        #length = frames / float(rate)    
        print(frames)
        if frames < min_length:
            min_length = frames
print()
print(min_length)