wavegru-vocoder

WaveGRU vocoder for TTS.

Visit this hugging-face-space for a live demo.

Introduction

This repo implements a WaveRNN vocoder from the paper Efficient Neural Audio Synthesis.

We predict a 8-bit mu-transformed signal instead of the 16-bit signal in the paper.
- We use a pre-emphasis filter with coef=0.86 to achieve good speech quality with only 8-bit signal (as used by LPCNet).
We use the upsampling network from Lyra.
We follow the prunning procedure in the WaveRNN paper.
- However, only the WaveRNN network is prunned to 95% sparsity. The upsampling network is not prunned.
We use Lyra sparse matmul library for fast inference on CPU for the live demo. Visit here for the source code of the live demo.

Step 1: download data

python ljs.py

Step 2: extract mel features and mu waveform

python extract_mel_mu.py <wav_dir> <ft_dir>

Step 3: prepare tf dataset

python tf_data.py <ft_dir>

Step 4: train wavegru vocoder

python train_on_tpu.py