Uberduck is a tool for fun and creativity with neural text-to-speech. This repository will get you creating your own speech synthesis models. Please see our training and synthesis notebooks, and the Wiki.
The main "Tacotron2" model in this repository is based on the NVIDIA Mellotron. The states of the various latent space features are
- Multispeaker training (functioning, beneficial)
- Torchmoji conditioning (functioning)
- Audio/speaker embedding (functioning)
- Pitch conditioning (non functioning)
- SRMR and MOSNet conditioning (non functioning)
It also includes teacher forcing type methods for prosody matching, as well as language, tensorboard, and torchscript support, and improvements to the learning rate scheduling.
The easiest ways to try us out are the colab notebooks, but if you want to install, run
conda create -n 'uberduck-ml-dev' python=3.8
source activate uberduck-ml-dev
pip install git+https://github.com/uberduck-ai/uberduck-ml-dev.git
-
Create your training config and filelists. Use the training configs in the
configs
directory as a starting point, e.g. this one. -
(Optional) Download torchmoji models if training with Torchmoji GST.
wget "https://github.com/johnpaulbin/torchMoji/releases/download/files/pytorch_model.bin" -O pytorch_model.bin wget "https://raw.githubusercontent.com/johnpaulbin/torchMoji/master/model/vocabulary.json" -O vocabulary.json
-
Start training. Example invocation for Tacotron2 training:
python -m uberduck_ml_dev.exec.train_tacotron2 --config tacotron2_config.json
We love contributions! To install in development mode, run
pip install pre-commit black # install the required development dependencies in a virtual environment
git clone git@github.com:uberduck-ai/uberduck-ml-dev.git # clone the repository:
pre-commit install # Install required Git hooks:
python setup.py develop # Install the library
In an environment with uberduck-ml-dev installed, run
python -m pytest