janvainer/speedyspeech

RuntimeError: stack expects a non-empty TensorList

Opened this issue · 10 comments

Hi. Thank you very much for your implementation. I tried to extract the duration by using the default configs (the only difference is that a different dataset is used). However, after 9 iterations, the following error occurred:

  File "code/duration_extractor.py", line 539, in <module>
    logdir=logdir
  File "code/duration_extractor.py", line 390, in fit
    valid_losses = self._validate(valid_loader)
  File "code/duration_extractor.py", line 465, in _validate
    sound, length = self.collate.stft.spec2wav(spec.transpose(1, 2), slen[-1:])
  File "/data/glusterfs_speech_tts_core/11117873/models/speedyspeech_yige/code/stft.py", line 119, in spec2wav
    magnitudes = self.mel2linear(magnitudes)
  File "/data/glusterfs_speech_tts_core/11117873/models/speedyspeech_yige/code/stft.py", line 137, in mel2linear
    return nnls(self.mel_basis, mel)
  File "/data/glusterfs_speech_tts_core/11117873/models/speedyspeech_yige/code/stft.py", line 46, in nnls
    torch.nn.utils.clip_grad_norm_(X, 1)
  File "/opt/conda/lib/python3.7/site-packages/torch/nn/utils/clip_grad.py", line 30, in clip_grad_norm_
    total_norm = torch.norm(torch.stack([torch.norm(p.grad.detach(), norm_type) for p in parameters]), norm_type)
RuntimeError: stack expects a non-empty TensorList

Could you help me to sovle this problem? Thank you ~

Hi thanks for your interest in this repo. Could you try if you are able to extract the durations for the default LJSpeech dataset? Could you please try to print how the inputs to the nnls function look like? (just add print in your repo local copy).
Also what checkpoint did you use for the duration extractor? Did you train your own, or did you use the default provided with this project?

I had the same error after I run this command
python code/duration_extractor.py

Traceback (most recent call last):
  File "code/duration_extractor.py", line 534, in <module>
    logdir=logdir
  File "code/duration_extractor.py", line 390, in fit
    valid_losses = self._validate(valid_loader)
  File "code/duration_extractor.py", line 461, in _validate
    sound, length = self.collate.stft.spec2wav(spec.transpose(1, 2), slen[-1:])
  File "/home/ubuntu/speedyspeech/code/stft.py", line 119, in spec2wav
    magnitudes = self.mel2linear(magnitudes)
  File "/home/ubuntu/speedyspeech/code/stft.py", line 137, in mel2linear
    return nnls(self.mel_basis, mel)
  File "/home/ubuntu/speedyspeech/code/stft.py", line 46, in nnls
    torch.nn.utils.clip_grad_norm_(X, 1)
  File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/torch/nn/utils/clip_g                                       rad.py", line 30, in clip_grad_norm_
    total_norm = torch.norm(torch.stack([torch.norm(p.grad.detach(), norm_type)                                        for p in parameters]), norm_type)
RuntimeError: stack expects a non-empty TensorList

Are you training on GPU or CPU? I will need more information to reproduce the error..

Ok, I tried a few times, and always got the same error. I followed all your steps, and after running this command
python code/duration_extractor.py, I got this error (as you can see model sent to cuda)

ubuntu@ip-172-31-68-24:~/speedyspeech$ python code/duration_extractor.py
Model sent to cuda
13000/13000: [===============================>] - ETA 1.6sss
Epoch 1 | Train - l1: 0.09392118094296291, guided_att: 0.00031112270836037095| V                  alid - l1: 0.3166225552558899, guided_att: 0.0004626042937161401|
13000/13000: [===============================>] - ETA 1.0sss
Epoch 2 | Train - l1: 0.06905996212231115, guided_att: 0.0002700031827905862| Va                  lid - l1: 0.3054344058036804, guided_att: 0.00043494933925103396|
13000/13000: [===============================>] - ETA 1.0sss
Epoch 3 | Train - l1: 0.06594225224749796, guided_att: 0.00026452819020498836| V                  alid - l1: 0.32097506523132324, guided_att: 0.00046123971696943045|
13000/13000: [===============================>] - ETA 1.1sss
Epoch 4 | Train - l1: 0.06372856097341759, guided_att: 0.0002559272787021014| Va                  lid - l1: 0.32438914477825165, guided_att: 0.00048450268513988703|
13000/13000: [===============================>] - ETA 1.0sss
Epoch 5 | Train - l1: 0.06199859332274921, guided_att: 0.0002551149550952669| Va                  lid - l1: 0.3171471357345581, guided_att: 0.0004896632890449837|
13000/13000: [===============================>] - ETA 1.0sss
Epoch 6 | Train - l1: 0.06050542716322274, guided_att: 0.0002568568125380928| Va                  lid - l1: 0.2853122800588608, guided_att: 0.00046930725511629134|
13000/13000: [===============================>] - ETA 1.0sss
Epoch 7 | Train - l1: 0.05929661129275566, guided_att: 0.0002494556063744859| Va                  lid - l1: 0.25290364027023315, guided_att: 0.0005208489892538637|
13000/13000: [===============================>] - ETA 1.0sss
Epoch 8 | Train - l1: 0.05856953240160284, guided_att: 0.00024662175923448256| V                  alid - l1: 0.39512471854686737, guided_att: 0.0008473480411339551|
13000/13000: [===============================>] - ETA 1.0sss
Epoch 9 | Train - l1: 0.05783513459959641, guided_att: 0.00024235204981612455| V                  alid - l1: 0.32342180609703064, guided_att: 0.0010448592656757683|
13000/13000: [===============================>] - ETA 1.0sss
Traceback (most recent call last):
  File "code/duration_extractor.py", line 534, in <module>
    logdir=logdir
  File "code/duration_extractor.py", line 390, in fit
    valid_losses = self._validate(valid_loader)
  File "code/duration_extractor.py", line 461, in _validate
    sound, length = self.collate.stft.spec2wav(spec.transpose(1, 2), slen[-1:])
  File "/home/ubuntu/speedyspeech/code/stft.py", line 119, in spec2wav
    magnitudes = self.mel2linear(magnitudes)
  File "/home/ubuntu/speedyspeech/code/stft.py", line 137, in mel2linear
    return nnls(self.mel_basis, mel)
  File "/home/ubuntu/speedyspeech/code/stft.py", line 46, in nnls
    torch.nn.utils.clip_grad_norm_(X, 1)
  File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/torch/nn/utils/clip_g                  rad.py", line 30, in clip_grad_norm_
    total_norm = torch.norm(torch.stack([torch.norm(p.grad.detach(), norm_type)                   for p in parameters]), norm_type)
RuntimeError: stack expects a non-empty TensorList

@adnan-mehremic Thanks for the info, I will try to replicate this during the weekend

@janvainer : as seen from pytorch/pytorch#38605, moved to torch==1.5.1 and the issue is not seen. anyhow, have to read up to understand what is going on.

Thanks for the link. My problem with this issue is that I am not able to reproduce this even with a clean setup and reinstalled dependencies and everything works even with torch==1.5.0. What might be a problem is that the requirements installation failed last time I tried and I had to install numpy and some other numeric packages separately. Could you please check that your installed dependencies are exactly the same like in requirements? Or just post it here and I will check. There is possibly some dependency version conflict that may arise when the packages are installed at once.

Thank you for awesome project! I had the same problem training the model for another language and moving to torch==1.5.1 fixed the problem for me. All the packages were matching the ones in the requirements.

Here is some info on the tensors from the nnls function:

mel_basis:  torch.Tensor of size [80, 513]
 mel_spec:  torch.Tensor of size [1, 80, 1128]
        X:  torch.Tensor of size [1, 513, 1128]

In both torch versions the tensors are the same. However, with 1.5.0 torch.nn.utils.clip_grad_norm_ seems to fail with the error mentioned above.

Thanks for trying this out! I will check if version 1.5.1 works for me and will bump up the requirement.

Hi all.

Just to report. I had the same problem. I updated to tourch==1.5.1. Indeed, it solved the problem. Although from another project I saw another solution: https://github.com/audio-captioning/dcase-2020-baseline/issues/7. The solution was to run the gradient backward before the gradient clip. I notice that you have done the same: first clip, then backward. Perhaps, changing these call orders could solve this problem for good?