A Speech Interface Toolkit for Neural Speech Synthesis with Pytorch
This repository is made for deploying your neural speech synthesis experiments efficiently. The main feature is defined as:
Matching audio feature parameters and their source codes for using major neural vocoders
They called an interface, which has encode and decode function.
Encode: Convert raw waveform to audio features. (e.g. mel-spectrogram, mfcc ...)
Decode: Reconstruct audio features to raw waveform. (i.e. neural vocoder)
- Usage Examples
- Compare experimental results of neural vocoder with others
- Use directly audio features and neural vocoders for neural speech synthesis models
$ pip install speech_interface
- Hifi-GAN (Universal v1, VCTK, LJSpeech) : speech_interface.interfaces.hifi_gan.InterfaceHifiGAN
- MelGAN (Multi Speaker and LJSpeech from official repository) : speech_interface.interfaces.mel_gan.InterfaceMelGAN
- WaveGlow (LJSpeech) (Universal will be added after solving import error) : speech_interface.interfaces.waveglow.InterfaceWaveGlow
- Multi-band MelGAN (VCTK, LJSpeech) : speech_interface.interfaces.multiband_mel_gan.InterfaceMultibandMelGAN
- Use an interface
import librosa
import torch
from speech_interface.interfaces.hifi_gan import InterfaceHifiGAN
# Make an interface
model_name = 'hifi_gan_v1_universal'
device = 'cuda'
interface = InterfaceHifiGAN(model_name=model_name, device=device)
wav, sr = librosa.load('/your/wav/form/file/path')
# to pytorch tensor
wav_tensor = torch.from_numpy(wav).unsqueeze(0) # (1, Tw)
# encode waveform tensor
features = interface.encode(wav_tensor)
# your speech synthesis process ...
# ...
# reconstruct waveform
pred_wav_tensor = interface.decode(features)
- Checkout available models and audio parameters
from speech_interface.interfaces.hifi_gan import InterfaceHifiGAN
# available models
print(InterfaceHifiGAN.available_models())
# audio parameters
print(InterfaceHifiGAN.audio_params())
- Hifi-GAN : https://github.com/jik876/hifi-gan
- MelGAN : https://github.com/descriptinc/melgan-neurips
- WaveGlow : https://github.com/NVIDIA/waveglow
- Multi-band MelGAN : https://github.com/kan-bayashi/ParallelWaveGAN
This repository is under MIT license.