C++ Code to run optimized inference in CUDA of Waveglow, this implementation gives 25% speedup over Nvidia's Pytorch implementation in full precision and 2.5-3x speedup when using TensorCore
By default, this code will use GPU's TensorCore when running on NVIDIA's Volta GPU
Cuda C++ implementation of NVIDIA's Waveglow.
The model architecture based on flows is described in this paper. WaveGlow: a Flow-based Generative Network for Speech Synthesis.
Waveglow, a flow-based network is capable of generating high quality speech from mel-spectograms. It combines insights from Glow and Wavenet in order to provide fast, efficient and high-quality audio synthesis, without the need for auto-regression.
WaveGlow is implemented using only a single network, trained using only a single cost function: maximizing the likelihood of the training data, which makes the training procedure simple and stable.
Paper claims that in full-precision (32 bit float) waveglow produces speech at the 500kHz on V100 but typically it is about 300-325kHz with pytorch's implementation and 400-420kHz using our implementation in full precision and around 1000kHz using TensorCore in full precision.
cpp
├── common (All common files; logger, utils, numpy reader)
│ └── header
│ ├── src
│
├── sys (ML units i.e conv, dense, activation)
│ └── header
│ ├── src
│
├── Waveglow (WN, upsample, main)
│ └── header
│ ├── src
├── tools
└── get_waveglow_weights.py
└── npy_2_aud.py
- Git clone the repository
- Download waveglow_weights
- Download mel_spectrograms
- Update waveglow_weights path in waveglow/header/hparams.hpp file
- Run this
make
ls -d path_2_mel_folder > filename.txt
./waveglow_tts filename.txt OutputDir
python tools/npy_2_aud.py OutputDir
- Audio will be stored in OutputDir in .wav format
You can also train your model using this and then use copy tools/get_waveglow_weights.py file in waveglow folder and run
python get_waveglow_weights.py <checkpoint path>
Currently the code takes around 250ms to generate 10secs of speech