get_grad_fn: Assertion 'output_nr ==0' failed

Question

get_grad_fn: Assertion 'output_nr ==0' failed

Opened this issue 3 years ago · 0 comments

Dear Soobinseo:

Thank you for your wonderful work.

When I run train_transformer.py after finished the first 3 steps you instructed in README, I got a RuntimeError:

(transformer-tts) xxx@zjlab:~/TTS/Transformer-TTS$ python train_postnet.py 
Processing at epoch 0:   0%|                                   | 0/409 [00:10<?, ?it/s]
Traceback (most recent call last):
  File "train_postnet.py", line 69, in <module>
    main()
  File "train_postnet.py", line 41, in main
    mag_pred = m.forward(mel)
  File "/home/xxx/anaconda3/envs/transformer-tts/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 114, in forward
    outputs = self.parallel_apply(replicas, inputs, kwargs)
  File "/home/xxx/anaconda3/envs/transformer-tts/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 124, in parallel_apply
    return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
  File "/home/xxx/anaconda3/envs/transformer-tts/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 65, in parallel_apply
    raise output
  File "/home/xxx/anaconda3/envs/transformer-tts/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 41, in _worker
    output = module(*input, **kwargs)
  File "/home/xxx/anaconda3/envs/transformer-tts/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/xxx/TTS/Transformer-TTS/network.py", line 170, in forward
    mel = self.cbhg(mel).transpose(1, 2)
  File "/home/xxx/anaconda3/envs/transformer-tts/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/xxx/TTS/Transformer-TTS/module.py", line 419, in forward
    out, _ = self.gru(highway)
  File "/home/xxx/anaconda3/envs/transformer-tts/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/xxx/anaconda3/envs/transformer-tts/lib/python3.6/site-packages/torch/nn/modules/rnn.py", line 192, in forward
    output, hidden = func(input, self.all_weights, hx, batch_sizes)
  File "/home/xxx/anaconda3/envs/transformer-tts/lib/python3.6/site-packages/torch/nn/_functions/rnn.py", line 323, in forward
    return func(input, *fargs, **fkwargs)
  File "/home/xxx/anaconda3/envs/transformer-tts/lib/python3.6/site-packages/torch/nn/_functions/rnn.py", line 287, in forward
    dropout_ts)
RuntimeError: torch/csrc/autograd/variable.cpp:115: get_grad_fn: Assertion `output_nr == 0` failed.

No matter whether I use single GPU or multi GPU, whether I use DataParallel or DistributedDataParallel, this error still exists.

Do you have any idea about the solution? Thank you very much.