xcmyz/FastSpeech

How to Combine Mel genereted from Fastspeech and using lpcnet as vocoder.

alokprasad opened this issue · 5 comments

lpcnet is quite fast , infact faster than real time on low footprint devices, if we can someone combine fastspeech ( mel spectogram ) to be fed to lpcnet, combination could be even faster tts.

lpcnet uses 20 mels while fastspeech 80 ,
what changes needed fastspeech to support 20 instead of 80.

xcmyz commented

lpcnet uses 20 mels while fastspeech 80 ,
what changes needed fastspeech to support 20 instead of 80.

You need a different way to process audio and then you need modify the hyper parameter in hparam.py

@xcmyz i did some changes in fastspeech for integrating with lpcnet here are my changes

  1. First prepossessed audio (ljspeech) and converted it to pcm(s16)
mkdir -p dataset/LJSpeech-1.1/pcms
for i in dataset/LJSpeech-1.1/wavs/*.wav
#sample rate 16khz for lpcnet or 22050?
do sox $i -r 16000 -c 1 -t sw - > dataset/LJSpeech-1.1/pcms/$(basename "$i" | cut -d. -f1).s16
done
  1. Then use below diff for fastspeech to train the network using 20 mels

https://github.com/alokprasad/binaries/blob/master/fast_speech_lpcnet.diff

Will keep you posteed

Looking forward to your update @alokprasad !