Integration of Fastspeech Text to Mel generation and fast Vocoder Squeezewave ( CPU only). This is one of the fastest TTS solution.
Code from
https://github.com/xcmyz/FastSpeech
https://github.com/tianrengao/SqueezeWave
Put Model in Squeezewave from
https://drive.google.com/file/d/1RyVMLY2l8JJGq_dCEAAd8rIRIn_k13UB/view?usp=sharing
and rename it Squeezewave.pt ( select based on quality and size tradeoff)
-rwxrwxrwx 1 root root 312M Jan 17 05:02 L128_large_pretrain
-rwxrwxrwx 1 root root 97M Jan 17 05:02 L128_small_pretrain
-rwxrwxrwx 1 root root 324M Jan 17 05:01 L64_large_pretrain
-rwxrwxrwx 1 root root 106M Jan 17 05:03 L64_small_pretrain
-
cd FastSpeech ; run_inference.sh
-
cd SqueezeWave ; run_inference.sh
This generate wave file.
( Time calculation except loading time of model)
Text -->" Printing, in the only sense with which we are at present concerned, differs from most if not from all the arts and crafts represented in the Exhibition in being comparatively modern"
Audio Duratio generated 11.5 Sec in arodun 3.83 seconds
On X86 3.6ghz Single Core
07:40:00alok@/mount/data/fastspeech_squeezewave/FastSpeech$ bash run_inference.sh
MEL Calculation:
2.827802896499634
07:40:37alok@/mount/data/fastspeech_squeezewave/SqueezeWave$ bash run_inference.sh
./test_synthesis.wav
Squeezewave vocoder time
1.0016820430755615
@@ On RasperryPi ( @varungujjar)
Raspberry Pi4 4GB
Model : L128_small_pretrain
Fastspeech :
MEL Calculation:
2.8617560863494873
SqueezeWave
Squeezewave vocoder time
14.423999309539795