Neural-Homomorphic-Vocoder

Unofficial PyTorch implementation of Neural Homomorphic Vocoder by Zhijun Liu, Kuan Chen, Kai Yu.

This paper propose the neural homomorphic vocoder (NHV), a source-filter model based neural vocoder framework.

Abstract : NHV synthesizes speech by filtering impulse trains and noise with linear time-varying (LTV) filters. A neural network controls the LTV filters by estimating complex cepstrums of time-varying impulse responses given acoustic features. The proposed framework can be trained with a combination of multi-resolution STFT loss and adversarial loss functions. Due to the use of DSP-based synthesis methods, NHV is highly efficient, fully controllable and interpretable. A vocoder was built under the framework to synthesize speech given log-Mel spectrograms and fundamental frequencies. While the model cost only 15 kFLOPs per sample, the synthesis quality remained comparable to baseline neural vocoders in both copy-synthesis and text-to-speech.

Audio samples and further information are provided in the online supplement.

Installation

Clone the repository and install dependencies.

# the codebase has been tested on Python 3.7 with PyTorch 1.8.0 binaries
git clone https://github.com/LqNoob/Neural-Homomorphic-Vocoder
pip install -r requirements.txt

Training

python train.py --config config_v1.json

Inference from wav file or for end-to-end speech synthesis

Make test_files directory and copy wav files into the directory.

Run the following command.

python inference.py --checkpoint_file [generator checkpoint file path]

Generated wav files are saved in generated_files by default.
You can change the path by adding --output_dir option.

Acknowledgements

We referred to hifi-gan, dsp and training_details to implement this.