[Bug] RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling `cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)` when training tacotron2

Question

[Bug] RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling `cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)` when training tacotron2

kin0303 opened this issue 3 years ago · 19 comments

🐛 Description

i got the UserWarning when i try to training tacotron2 with ljspeech. the warning is as below:

/media/DATA-2/TTS/coqui/TTS/TTS/tts/models/tacotron2.py:341: UserWarning: __floordiv__ is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor').
  alignment_lengths = mel_lengths // self.decoder.r

and after epoch 8/1000 in step 1886/3243 i got error like this:

RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling `cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)`

Is that the cause of the error? how to handle it?

I will display the error in its entirety

   --> STEP: 1886/3243 -- GLOBAL_STEP: 27830
     | > decoder_loss: 5.84848  (6.74442)
     | > postnet_loss: 6.73053  (7.62051)
     | > stopnet_loss: 0.29443  (0.42176)
     | > decoder_coarse_loss: 6.32641  (7.27556)
     | > decoder_ddc_loss: 0.00271  (0.00433)
     | > ga_loss: 0.00282  (0.00469)
     | > decoder_diff_spec_loss: 1.54124  (1.60230)
     | > postnet_diff_spec_loss: 3.17193  (3.19602)
     | > decoder_ssim_loss: 0.91339  (0.91384)
     | > postnet_ssim_loss: 0.92837  (0.92731)
     | > loss: 6.67432  (7.51629)
     | > align_error: 0.98659  (0.97846)
     | > grad_norm: 3.66964  (5.65185)
     | > current_lr: 0.00000 
     | > step_time: 0.72660  (0.52215)
     | > loader_time: 0.00150  (0.00552)

 ! Run is kept in /media/DATA-2/TTS/coqui/TTS/run-April-19-2022_01+40PM-0cf3265a
Traceback (most recent call last):
  File "/media/DATA-2/TTS/coqui/tts_coqui/lib/python3.8/site-packages/trainer/trainer.py", line 1461, in fit
    self._fit()
  File "/media/DATA-2/TTS/coqui/tts_coqui/lib/python3.8/site-packages/trainer/trainer.py", line 1445, in _fit
    self.train_epoch()
  File "/media/DATA-2/TTS/coqui/tts_coqui/lib/python3.8/site-packages/trainer/trainer.py", line 1224, in train_epoch
    _, _ = self.train_step(batch, batch_num_steps, cur_step, loader_start_time)
  File "/media/DATA-2/TTS/coqui/tts_coqui/lib/python3.8/site-packages/trainer/trainer.py", line 1057, in train_step
    outputs, loss_dict_new, step_time = self._optimize(
  File "/media/DATA-2/TTS/coqui/tts_coqui/lib/python3.8/site-packages/trainer/trainer.py", line 946, in _optimize
    outputs, loss_dict = self._model_train_step(batch, model, criterion)
  File "/media/DATA-2/TTS/coqui/tts_coqui/lib/python3.8/site-packages/trainer/trainer.py", line 902, in _model_train_step
    return model.train_step(*input_args)
  File "/media/DATA-2/TTS/coqui/TTS/TTS/tts/models/tacotron2.py", line 344, in train_step
    outputs = self.forward(text_input, text_lengths, mel_input, mel_lengths, aux_input)
  File "/media/DATA-2/TTS/coqui/TTS/TTS/tts/models/tacotron2.py", line 206, in forward
    decoder_outputs, alignments, stop_tokens = self.decoder(encoder_outputs, mel_specs, input_mask)
  File "/media/DATA-2/TTS/coqui/tts_coqui/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/media/DATA-2/TTS/coqui/TTS/TTS/tts/layers/tacotron/tacotron2.py", line 321, in forward
    decoder_output, attention_weights, stop_token = self.decode(memory)
  File "/media/DATA-2/TTS/coqui/TTS/TTS/tts/layers/tacotron/tacotron2.py", line 284, in decode
    decoder_output = self.linear_projection(decoder_hidden_context)
  File "/media/DATA-2/TTS/coqui/tts_coqui/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/media/DATA-2/TTS/coqui/TTS/TTS/tts/layers/tacotron/common_layers.py", line 25, in forward
    return self.linear_layer(x)
  File "/media/DATA-2/TTS/coqui/tts_coqui/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/media/DATA-2/TTS/coqui/tts_coqui/lib/python3.8/site-packages/torch/nn/modules/linear.py", line 103, in forward
    return F.linear(input, self.weight, self.bias)
RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling `cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)`

Environment

{
"CUDA": {
"GPU": [
"NVIDIA GeForce GTX 1660 Ti"
],
"available": true,
"version": "10.2"
},
"Packages": {
"PyTorch_debug": false,
"PyTorch_version": "1.11.0+cu102",
"TTS": "0.6.1",
"numpy": "1.19.5"
},
"System": {
"OS": "Linux",
"architecture": [
"64bit",
"ELF"
],
"processor": "x86_64",
"python": "3.8.0",
"version": "#118~18.04.1-Ubuntu SMP Thu Mar 3 13:53:15 UTC 2022"
}
}

Additional context

I've reduced the batch size, currently I'm using batch size = 4. Is this also caused by reducing the batch size? But if it's not reduced it will be OOM.

Answer 1 · 2022-04-20T11:36:39.000Z

Hey, could you please try with the latest CUDA 11 version?

Answer 2 · 2022-04-21T05:07:55.000Z

Hey, could you please try with the latest CUDA 11 version?

Okey, I'll try with the CUDA 11.3 version

Answer 3 · 2022-04-22T07:50:16.000Z

Hey, could you please try with the latest CUDA 11 version?

Okey, I'll try with the CUDA 11.3 version

I had trying, but still error

   --> STEP: 267/3243 -- GLOBAL_STEP: 3510
     | > decoder_loss: 33.00844  (33.08039)
     | > postnet_loss: 35.04267  (35.08060)
     | > stopnet_loss: 0.81244  (0.85546)
     | > decoder_coarse_loss: 32.94590  (33.04457)
     | > decoder_ddc_loss: 0.00334  (0.00528)
     | > ga_loss: 0.00708  (0.01021)
     | > decoder_diff_spec_loss: 0.41570  (0.43186)
     | > postnet_diff_spec_loss: 4.45244  (4.44248)
     | > decoder_ssim_loss: 0.99999  (0.99990)
     | > postnet_ssim_loss: 0.99990  (0.99950)
     | > loss: 27.81493  (27.92766)
     | > align_error: 0.97789  (0.96763)
     | > grad_norm: 4.54514  (5.27340)
     | > current_lr: 0.00000 
     | > step_time: 0.30480  (0.25091)
     | > loader_time: 0.00150  (0.03358)

 ! Run is kept in /media/DATA-2/TTS/coqui/TTS/run-April-22-2022_01+20PM-0cf3265a
Traceback (most recent call last):
  File "/media/DATA-2/TTS/coqui/tts_coqui/lib/python3.8/site-packages/trainer/trainer.py", line 1485, in fit
    self._fit()
  File "/media/DATA-2/TTS/coqui/tts_coqui/lib/python3.8/site-packages/trainer/trainer.py", line 1469, in _fit
    self.train_epoch()
  File "/media/DATA-2/TTS/coqui/tts_coqui/lib/python3.8/site-packages/trainer/trainer.py", line 1248, in train_epoch
    _, _ = self.train_step(batch, batch_num_steps, cur_step, loader_start_time)
  File "/media/DATA-2/TTS/coqui/tts_coqui/lib/python3.8/site-packages/trainer/trainer.py", line 1081, in train_step
    outputs, loss_dict_new, step_time = self._optimize(
  File "/media/DATA-2/TTS/coqui/tts_coqui/lib/python3.8/site-packages/trainer/trainer.py", line 1018, in _optimize
    loss_dict["loss"].backward()
  File "/media/DATA-2/TTS/coqui/tts_coqui/lib/python3.8/site-packages/torch/_tensor.py", line 363, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
  File "/media/DATA-2/TTS/coqui/tts_coqui/lib/python3.8/site-packages/torch/autograd/__init__.py", line 173, in backward
    Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling `cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)`

Environment

{
"CUDA": {
"GPU": [
"NVIDIA GeForce GTX 1660 Ti"
],
"available": true,
"version": "11.3"
},
"Packages": {
"PyTorch_debug": false,
"PyTorch_version": "1.11.0+cu113",
"TTS": "0.6.1",
"numpy": "1.19.5"
},
"System": {
"OS": "Linux",
"architecture": [
"64bit",
"ELF"
],
"processor": "x86_64",
"python": "3.8.0",
"version": "#123~18.04.1-Ubuntu SMP Fri Apr 8 09:48:52 UTC 2022"
}
}

Answer 4 · 2022-05-07T11:00:15.000Z

It is, in general, a sign of insufficient VRAM. You can try a GPU with a larger VRAM or reduce the batch size or limit the maximum audio length allowed in training.

Answer 5 · 2022-06-06T13:38:06.000Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. You might also look our discussion channels.

Answer 6 · 2022-06-28T04:22:49.000Z

This issue was resolved by adding memory

Answer 7 · 2022-07-26T19:52:20.000Z

Hi @BlackMamba1122,

I am facing the same issue.

Can you please give more detail on what you mean by "adding memory"? Did you use a different GPU with larger VRAM?

Thanks.

Answer 8 · 2022-08-01T07:30:56.000Z

Hi @BlackMamba1122,

I am facing the same issue.

Can you please give more detail on what you mean by "adding memory"? Did you use a different GPU with larger VRAM?

Thanks.

Hi, sorry I just read your comment. Yes, before I used RAM: 16 GiB dual channel, and now I used 32 GiB dual channel. But it only works in batch_size=4.

Answer 9 · 2023-01-24T14:18:02.000Z

I experienced this issue on Linux and I solved it by running

$ unset LD_LIBRARY_PATH

Answer 10 · 2023-02-01T14:11:49.000Z

I got this error too. It's funny that re-executing the code solved the problem for me.

Answer 11 · 2023-02-21T07:26:22.000Z

unset LD_LIBRARY_PATH

Saved my day! Thanks!

Answer 12 · 2023-04-19T18:56:07.000Z

I still got this error after trying 'unset LD_LIBRARY_PATH'. The thing for me is it can run on CPU. But not on cuda

Answer 13 · 2023-06-15T11:29:26.000Z

I'm still having this issue after running 'unset LD_LIBRARY_PATH'

Answer 14 · 2023-08-04T11:19:11.000Z

unset LD_LIBRARY_PATH

Hoping to understand this issue deeper. For those who found success in this command, why does this command work for you? I'm reading this link but I'm not following how it is connected to the CUDA error.

Answer 15 · 2023-08-29T14:52:32.000Z

I experienced this issue on Linux and I solved it by running
$ unset LD_LIBRARY_PATH

I LOVE YOU

Answer 16 · 2023-09-17T15:11:21.000Z

I experienced this issue on Linux and I solved it by running
$ unset LD_LIBRARY_PATH

I figured this out!!! THX! I don't understand it, but I was just in awe.

Answer 17 · 2023-10-18T05:45:37.000Z

I still got this error after trying 'unset LD_LIBRARY_PATH'. The thing for me is it can run on CPU. But not on cuda

Same. Can someone help?

Answer 18 · 2024-03-16T09:20:58.000Z

Also fixed for me after $ unset LD_LIBRARY_PATH and would love to know why.

Answer 19 · 2024-04-26T09:13:49.000Z

I experienced this issue on Linux and I solved it by running
$ unset LD_LIBRARY_PATH

Thank you very much, this worked for me, but I specifically want to know why?