Very slow

Question

Very slow

Closed this issue 3 years ago · 2 comments

Is it normal that the voice generation takes several minutes for 4 seconds of speech?

Or is this simply due to Tacotron2?
If trained with FastSpeech2, would the generation be faster?

Answer 1 · 2021-05-07T14:38:16.000Z

It's the releases WaveGrad vocoder which isn't really fast. On GPU (CUDA) it is faster, than on CPU, but it's still far away from realtime.
I'm training a Fullband-MelGAN vocoder for some days now. Training will take some more weeks. Once this is ready for release it will be faster than realtime on GPU and CPU. Hopefully i can release it by the end of June.

Answer 2 · 2021-05-11T20:03:36.000Z

I'll close this issue and post an update on my Twitter account (https://twitter.com/ThorstenVoice) once my Fullband-MelGAN is available. Feel free to ask further questions or reopen this issue when needed.

This phrase has been generated with my training-in-progress model with an RTF around 0.5.
https://soundcloud.com/thorsten-mueller-395984278/fullband-melgan-test-traininginprogress