Problems using OpenVoice with cuda and >5s source audio
Closed this issue · 2 comments
eginhard commented
Discussed in #232
Originally posted by CiobanuPaul December 23, 2024
I have just upgraded coqui-tts to 0.25.1 to be able to use OpenVoice voice converter.
One issue I get is that an exception occurs if I use "cuda". It works only on "cpu".
The second issue is that the output of the vc has always only 5 seconds of content, the rest of it is white noise (if the source wav is bigger than 5 seconds).
I am using python3.10
This is a part of the exception message for the first issue:
File "/home/catalin/Documents/virtual_envs/venv/lib/python3.10/site-packages/TTS/vc/models/openvoice.py", line 288, in extract_se
y = torch.FloatTensor(audio_ref)
TypeError: expected TensorOptions(dtype=float, device=cpu, layout=Strided, requires_grad=false (default), pinned_memory=false (default), memory_format=(nullopt)) (got TensorOptions(dtype=float, device=cuda:0, layout=Strided, requires_grad=false (default), pinned_memory=false (default), memory_format=(nullopt)))
CiobanuPaul commented
I tried using freevc voice converter as well and it gives me a similar error when using CUDA. I didn't have this issue before.