thorstenMueller/Thorsten-Voice

Very slow

Closed this issue · 2 comments

Is it normal that the voice generation takes several minutes for 4 seconds of speech?

Or is this simply due to Tacotron2?
If trained with FastSpeech2, would the generation be faster?

It's the releases WaveGrad vocoder which isn't really fast. On GPU (CUDA) it is faster, than on CPU, but it's still far away from realtime.
I'm training a Fullband-MelGAN vocoder for some days now. Training will take some more weeks. Once this is ready for release it will be faster than realtime on GPU and CPU. Hopefully i can release it by the end of June.

I'll close this issue and post an update on my Twitter account (https://twitter.com/ThorstenVoice) once my Fullband-MelGAN is available. Feel free to ask further questions or reopen this issue when needed.

This phrase has been generated with my training-in-progress model with an RTF around 0.5.
https://soundcloud.com/thorsten-mueller-395984278/fullband-melgan-test-traininginprogress