Problems using OpenVoice with cuda and >5s source audio

Question

Problems using OpenVoice with cuda and >5s source audio

Closed this issue 3 months ago · 2 comments

Discussed in #232

^{Originally posted by CiobanuPaul December 23, 2024}
I have just upgraded coqui-tts to 0.25.1 to be able to use OpenVoice voice converter.
One issue I get is that an exception occurs if I use "cuda". It works only on "cpu".
The second issue is that the output of the vc has always only 5 seconds of content, the rest of it is white noise (if the source wav is bigger than 5 seconds).

I am using python3.10
This is a part of the exception message for the first issue:

 File "/home/catalin/Documents/virtual_envs/venv/lib/python3.10/site-packages/TTS/vc/models/openvoice.py", line 288, in extract_se
    y = torch.FloatTensor(audio_ref)
TypeError: expected TensorOptions(dtype=float, device=cpu, layout=Strided, requires_grad=false (default), pinned_memory=false (default), memory_format=(nullopt)) (got TensorOptions(dtype=float, device=cuda:0, layout=Strided, requires_grad=false (default), pinned_memory=false (default), memory_format=(nullopt)))

Answer 1 · 2025-01-06T16:34:45.000Z

I tried using freevc voice converter as well and it gives me a similar error when using CUDA. I didn't have this issue before.

Answer 2 · 2025-01-07T15:53:49.000Z

Thanks, I've already fixed the cuda issues in #244 and will look into the noise next.