How to Combine Mel genereted from Fastspeech and using lpcnet as vocoder.

Question

How to Combine Mel genereted from Fastspeech and using lpcnet as vocoder.

alokprasad opened this issue 5 years ago · 5 comments

lpcnet is quite fast , infact faster than real time on low footprint devices, if we can someone combine fastspeech ( mel spectogram ) to be fed to lpcnet, combination could be even faster tts.

Answer 1 · 2020-03-04T07:43:37.000Z

modify synthesis.py https://github.com/xcmyz/FastSpeech/blob/master/synthesis.py#L65

Answer 2 · 2020-03-04T15:27:49.000Z

lpcnet uses 20 mels while fastspeech 80 ,
what changes needed fastspeech to support 20 instead of 80.

Answer 3 · 2020-03-04T15:40:44.000Z

lpcnet uses 20 mels while fastspeech 80 ,
what changes needed fastspeech to support 20 instead of 80.

You need a different way to process audio and then you need modify the hyper parameter in hparam.py

Answer 4 · 2020-03-13T02:22:39.000Z

@xcmyz i did some changes in fastspeech for integrating with lpcnet here are my changes

First prepossessed audio (ljspeech) and converted it to pcm(s16)

mkdir -p dataset/LJSpeech-1.1/pcms
for i in dataset/LJSpeech-1.1/wavs/*.wav
#sample rate 16khz for lpcnet or 22050?
do sox $i -r 16000 -c 1 -t sw - > dataset/LJSpeech-1.1/pcms/$(basename "$i" | cut -d. -f1).s16
done

Then use below diff for fastspeech to train the network using 20 mels

https://github.com/alokprasad/binaries/blob/master/fast_speech_lpcnet.diff

Will keep you posteed

Answer 5 · 2020-03-28T22:55:41.000Z

Looking forward to your update @alokprasad !