wenet-e2e/wetts

aishell3 training error

Closed this issue · 0 comments

Describe the bug
I used the default run.sh script, with v1.json and aishell3 data.
To Reproduce
bash run.sh --stage 1 --stop_stage 1

Expected behavior

-- Process 3 terminated with the following error: Traceback (most recent call last): File "python3.8/site-packages/torch/multiprocessing/spawn.py", line 69, in _wrap fn(i, *args) File "wetts/examples/aishell-3/vits/train.py", line 149, in run train_and_evaluate( File "wetts/examples/aishell-3/vits/train.py", line 209, in train_and_evaluate ) = net_g(x, x_lengths, spec, spec_lengths, speakers) File "python3.8/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl return forward_call(*input, **kwargs) File "python3.8/site-packages/torch/nn/parallel/distributed.py", line 1040, in forward output = self._run_ddp_forward(*inputs, **kwargs) File "python3.8/site-packages/torch/nn/parallel/distributed.py", line 1000, in _run_ddp_forward return module_to_run(*inputs[0], **kwargs[0]) File "python3.8/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl return forward_call(*input, **kwargs) File "wetts/vits/models.py", line 612, in forward z, m_q, logs_q, y_mask = self.enc_q(y, y_lengths, g=g) File "python3.8/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl return forward_call(*input, **kwargs) File "wetts/vits/models.py", line 294, in forward x = self.pre(x) * x_mask File "python3.8/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl return forward_call(*input, **kwargs) File "python3.8/site-packages/torch/nn/modules/conv.py", line 313, in forward return self._conv_forward(input, self.weight, self.bias) File "python3.8/site-packages/torch/nn/modules/conv.py", line 309, in _conv_forward return F.conv1d(input, weight, bias, self.stride,
RuntimeError: cuDNN error: CUDNN_STATUS_INTERNAL_ERROR You can try to repro this exception using the following code snippet. If that doesn't trigger the error, please include your original repro script when reporting this issue. import torch torch.backends.cuda.matmul.allow_tf32 = False torch.backends.cudnn.benchmark = True torch.backends.cudnn.deterministic = False torch.backends.cudnn.allow_tf32 = True data = torch.randn([32, 513, 1, 296], dtype=torch.float, device='cuda', requires_grad=True) net = torch.nn.Conv2d(513, 192, kernel_size=[1, 1], padding=[0, 0], stride=[1, 1], dilation=[1, 1], groups=1) net = net.cuda().float() out = net(data) out.backward(torch.randn_like(out)) torch.cuda.synchronize() ConvolutionParams memory_format = Contiguous data_type = CUDNN_DATA_FLOAT padding = [0, 0, 0] stride = [1, 1, 0] dilation = [1, 1, 0] groups = 1 deterministic = false allow_tf32 = true input: TensorDescriptor 0xa0166d60 type = CUDNN_DATA_FLOAT nbDims = 4 dimA = 32, 513, 1, 296, strideA = 151848, 296, 296, 1, output: TensorDescriptor 0xa0167030 type = CUDNN_DATA_FLOAT nbDims = 4 dimA = 32, 192, 1, 296, strideA = 56832, 296, 296, 1, weight: FilterDescriptor 0xa01617c0 type = CUDNN_DATA_FLOAT tensor_format = CUDNN_TENSOR_NCHW
nbDims = 4 dimA = 192, 513, 1, 1, Pointer addresses: input: 0x7fd35a000000 output: 0x7fd35ea06600 weight: 0x7fd40fd9e000
Additional context
Using the traceback address, I printed the dimensions of x and mask for the conv1d input and found that the input dimension of 513 was consistent with the default. Therefore, can you provide some advices to this error? Looking forward to your reply