RuntimeError: stack expects a non-empty TensorList
Opened this issue · 10 comments
Hi. Thank you very much for your implementation. I tried to extract the duration by using the default configs (the only difference is that a different dataset is used). However, after 9 iterations, the following error occurred:
File "code/duration_extractor.py", line 539, in <module>
logdir=logdir
File "code/duration_extractor.py", line 390, in fit
valid_losses = self._validate(valid_loader)
File "code/duration_extractor.py", line 465, in _validate
sound, length = self.collate.stft.spec2wav(spec.transpose(1, 2), slen[-1:])
File "/data/glusterfs_speech_tts_core/11117873/models/speedyspeech_yige/code/stft.py", line 119, in spec2wav
magnitudes = self.mel2linear(magnitudes)
File "/data/glusterfs_speech_tts_core/11117873/models/speedyspeech_yige/code/stft.py", line 137, in mel2linear
return nnls(self.mel_basis, mel)
File "/data/glusterfs_speech_tts_core/11117873/models/speedyspeech_yige/code/stft.py", line 46, in nnls
torch.nn.utils.clip_grad_norm_(X, 1)
File "/opt/conda/lib/python3.7/site-packages/torch/nn/utils/clip_grad.py", line 30, in clip_grad_norm_
total_norm = torch.norm(torch.stack([torch.norm(p.grad.detach(), norm_type) for p in parameters]), norm_type)
RuntimeError: stack expects a non-empty TensorList
Could you help me to sovle this problem? Thank you ~
Hi thanks for your interest in this repo. Could you try if you are able to extract the durations for the default LJSpeech dataset? Could you please try to print how the inputs to the nnls function look like? (just add print in your repo local copy).
Also what checkpoint did you use for the duration extractor? Did you train your own, or did you use the default provided with this project?
I had the same error after I run this command
python code/duration_extractor.py
Traceback (most recent call last):
File "code/duration_extractor.py", line 534, in <module>
logdir=logdir
File "code/duration_extractor.py", line 390, in fit
valid_losses = self._validate(valid_loader)
File "code/duration_extractor.py", line 461, in _validate
sound, length = self.collate.stft.spec2wav(spec.transpose(1, 2), slen[-1:])
File "/home/ubuntu/speedyspeech/code/stft.py", line 119, in spec2wav
magnitudes = self.mel2linear(magnitudes)
File "/home/ubuntu/speedyspeech/code/stft.py", line 137, in mel2linear
return nnls(self.mel_basis, mel)
File "/home/ubuntu/speedyspeech/code/stft.py", line 46, in nnls
torch.nn.utils.clip_grad_norm_(X, 1)
File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/torch/nn/utils/clip_g rad.py", line 30, in clip_grad_norm_
total_norm = torch.norm(torch.stack([torch.norm(p.grad.detach(), norm_type) for p in parameters]), norm_type)
RuntimeError: stack expects a non-empty TensorList
Are you training on GPU or CPU? I will need more information to reproduce the error..
Ok, I tried a few times, and always got the same error. I followed all your steps, and after running this command
python code/duration_extractor.py
, I got this error (as you can see model sent to cuda)
ubuntu@ip-172-31-68-24:~/speedyspeech$ python code/duration_extractor.py
Model sent to cuda
13000/13000: [===============================>] - ETA 1.6sss
Epoch 1 | Train - l1: 0.09392118094296291, guided_att: 0.00031112270836037095| V alid - l1: 0.3166225552558899, guided_att: 0.0004626042937161401|
13000/13000: [===============================>] - ETA 1.0sss
Epoch 2 | Train - l1: 0.06905996212231115, guided_att: 0.0002700031827905862| Va lid - l1: 0.3054344058036804, guided_att: 0.00043494933925103396|
13000/13000: [===============================>] - ETA 1.0sss
Epoch 3 | Train - l1: 0.06594225224749796, guided_att: 0.00026452819020498836| V alid - l1: 0.32097506523132324, guided_att: 0.00046123971696943045|
13000/13000: [===============================>] - ETA 1.1sss
Epoch 4 | Train - l1: 0.06372856097341759, guided_att: 0.0002559272787021014| Va lid - l1: 0.32438914477825165, guided_att: 0.00048450268513988703|
13000/13000: [===============================>] - ETA 1.0sss
Epoch 5 | Train - l1: 0.06199859332274921, guided_att: 0.0002551149550952669| Va lid - l1: 0.3171471357345581, guided_att: 0.0004896632890449837|
13000/13000: [===============================>] - ETA 1.0sss
Epoch 6 | Train - l1: 0.06050542716322274, guided_att: 0.0002568568125380928| Va lid - l1: 0.2853122800588608, guided_att: 0.00046930725511629134|
13000/13000: [===============================>] - ETA 1.0sss
Epoch 7 | Train - l1: 0.05929661129275566, guided_att: 0.0002494556063744859| Va lid - l1: 0.25290364027023315, guided_att: 0.0005208489892538637|
13000/13000: [===============================>] - ETA 1.0sss
Epoch 8 | Train - l1: 0.05856953240160284, guided_att: 0.00024662175923448256| V alid - l1: 0.39512471854686737, guided_att: 0.0008473480411339551|
13000/13000: [===============================>] - ETA 1.0sss
Epoch 9 | Train - l1: 0.05783513459959641, guided_att: 0.00024235204981612455| V alid - l1: 0.32342180609703064, guided_att: 0.0010448592656757683|
13000/13000: [===============================>] - ETA 1.0sss
Traceback (most recent call last):
File "code/duration_extractor.py", line 534, in <module>
logdir=logdir
File "code/duration_extractor.py", line 390, in fit
valid_losses = self._validate(valid_loader)
File "code/duration_extractor.py", line 461, in _validate
sound, length = self.collate.stft.spec2wav(spec.transpose(1, 2), slen[-1:])
File "/home/ubuntu/speedyspeech/code/stft.py", line 119, in spec2wav
magnitudes = self.mel2linear(magnitudes)
File "/home/ubuntu/speedyspeech/code/stft.py", line 137, in mel2linear
return nnls(self.mel_basis, mel)
File "/home/ubuntu/speedyspeech/code/stft.py", line 46, in nnls
torch.nn.utils.clip_grad_norm_(X, 1)
File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/torch/nn/utils/clip_g rad.py", line 30, in clip_grad_norm_
total_norm = torch.norm(torch.stack([torch.norm(p.grad.detach(), norm_type) for p in parameters]), norm_type)
RuntimeError: stack expects a non-empty TensorList
@adnan-mehremic Thanks for the info, I will try to replicate this during the weekend
@janvainer : as seen from pytorch/pytorch#38605, moved to torch==1.5.1 and the issue is not seen. anyhow, have to read up to understand what is going on.
Thanks for the link. My problem with this issue is that I am not able to reproduce this even with a clean setup and reinstalled dependencies and everything works even with torch==1.5.0. What might be a problem is that the requirements installation failed last time I tried and I had to install numpy and some other numeric packages separately. Could you please check that your installed dependencies are exactly the same like in requirements? Or just post it here and I will check. There is possibly some dependency version conflict that may arise when the packages are installed at once.
Thank you for awesome project! I had the same problem training the model for another language and moving to torch==1.5.1 fixed the problem for me. All the packages were matching the ones in the requirements.
Here is some info on the tensors from the nnls function:
mel_basis: torch.Tensor of size [80, 513]
mel_spec: torch.Tensor of size [1, 80, 1128]
X: torch.Tensor of size [1, 513, 1128]
In both torch versions the tensors are the same. However, with 1.5.0 torch.nn.utils.clip_grad_norm_ seems to fail with the error mentioned above.
Thanks for trying this out! I will check if version 1.5.1 works for me and will bump up the requirement.
Hi all.
Just to report. I had the same problem. I updated to tourch==1.5.1. Indeed, it solved the problem. Although from another project I saw another solution: https://github.com/audio-captioning/dcase-2020-baseline/issues/7. The solution was to run the gradient backward before the gradient clip. I notice that you have done the same: first clip, then backward. Perhaps, changing these call orders could solve this problem for good?