p0p4k/vits2_pytorch

RuntimeError: The expanded size of the tensor

Closed this issue · 4 comments

Hi.
I'm training VITS2, and an error occurred as follows. I started training again after deleting the file that had an error, but the error occurred on the other side of the file. Do you happen to know what the problem is?

`Traceback (most recent call last):
File "/workspace/vits2/train_ms.py", line 632, in
main()
File "/workspace/vits2/train_ms.py", line 45, in main
mp.spawn(
File "/usr/local/lib/python3.10/dist-packages/torch/multiprocessing/spawn.py", line 241, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method="spawn")
File "/usr/local/lib/python3.10/dist-packages/torch/multiprocessing/spawn.py", line 197, in start_processes
while not context.join():
File "/usr/local/lib/python3.10/dist-packages/torch/multiprocessing/spawn.py", line 158, in join
raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException:

-- Process 0 terminated with the following error:
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/torch/multiprocessing/spawn.py", line 68, in _wrap
fn(i, *args)
File "/workspace/vits2/train_ms.py", line 272, in run
train_and_evaluate(
File "/workspace/vits2/train_ms.py", line 361, in train_and_evaluate
) = net_g(x, x_lengths, spec, spec_lengths, speakers)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/parallel/distributed.py", line 1523, in forward
else self._run_ddp_forward(*inputs, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/parallel/distributed.py", line 1359, in _run_ddp_forward
return self.module(*inputs, **kwargs) # type: ignore[index]
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/workspace/vits2/models.py", line 1297, in forward
z_slice, ids_slice = commons.rand_slice_segments(
File "/workspace/vits2/commons.py", line 65, in rand_slice_segments
ret = slice_segments(x, ids_str, segment_size)
File "/workspace/vits2/commons.py", line 55, in slice_segments
ret[i] = x[i, :, idx_str:idx_end]
RuntimeError: The expanded size of the tensor (22) must match the existing size (3) at non-singleton dimension 1. Target sizes: [192, 22]. Tensor sizes: [192, 3]
`

Hi, what was the solution to your problem, i am curious...

@p0p4k Hi, i had data smaller then segment size.
So i removed the data.

Oh right. Thanks!

The issue is in TextAudioSpeakerLoader._filter() method:
https://github.com/p0p4k/vits2_pytorch/blob/1f4f3790568180f8dec4419d5cad5d0877b034bb/data_utils.py#L259C17-L262C39

The wav length estimation is inaccurate. I fixed it like this:

wav_length = librosa.get_duration(filename=audiopath) * self.sampling_rate
spec_length = wav_length // self.hop_length
if spec_length < self.min_audio_len // self.hop_length:
    print(f"Audio too short: {audiopath}")
    continue