WaveGRU vocoder for TTS.
Visit this hugging-face-space for a live demo.
This repo implements a WaveRNN vocoder from the paper Efficient Neural Audio Synthesis.
- We predict a 8-bit mu-transformed signal instead of the 16-bit signal in the paper.
- We use a pre-emphasis filter with
coef=0.86
to achieve good speech quality with only 8-bit signal (as used by LPCNet).
- We use a pre-emphasis filter with
- We use the upsampling network from Lyra.
- We follow the prunning procedure in the WaveRNN paper.
- However, only the WaveRNN network is prunned to 95% sparsity. The upsampling network is not prunned.
- We use Lyra sparse matmul library for fast inference on CPU for the live demo. Visit here for the source code of the live demo.
Step 1: download data
python ljs.py
Step 2: extract mel features and mu waveform
python extract_mel_mu.py <wav_dir> <ft_dir>
Step 3: prepare tf dataset
python tf_data.py <ft_dir>
Step 4: train wavegru vocoder
python train_on_tpu.py