How to Combine Mel genereted from Fastspeech and using lpcnet as vocoder.
alokprasad opened this issue · 5 comments
lpcnet is quite fast , infact faster than real time on low footprint devices, if we can someone combine fastspeech ( mel spectogram ) to be fed to lpcnet, combination could be even faster tts.
modify synthesis.py
https://github.com/xcmyz/FastSpeech/blob/master/synthesis.py#L65
lpcnet uses 20 mels while fastspeech 80 ,
what changes needed fastspeech to support 20 instead of 80.
lpcnet uses 20 mels while fastspeech 80 ,
what changes needed fastspeech to support 20 instead of 80.
You need a different way to process audio and then you need modify the hyper parameter in hparam.py
@xcmyz i did some changes in fastspeech for integrating with lpcnet here are my changes
- First prepossessed audio (ljspeech) and converted it to pcm(s16)
mkdir -p dataset/LJSpeech-1.1/pcms
for i in dataset/LJSpeech-1.1/wavs/*.wav
#sample rate 16khz for lpcnet or 22050?
do sox $i -r 16000 -c 1 -t sw - > dataset/LJSpeech-1.1/pcms/$(basename "$i" | cut -d. -f1).s16
done
- Then use below diff for fastspeech to train the network using 20 mels
https://github.com/alokprasad/binaries/blob/master/fast_speech_lpcnet.diff
Will keep you posteed
Looking forward to your update @alokprasad !