🦆 Uberduck Text-to-speech

Uberduck is a tool for fun and creativity with neural text-to-speech. This repository will get you creating your own speech synthesis models. Please see our training and synthesis notebooks, and the Wiki.

Overview

The main "Tacotron2" model in this repository is based on the NVIDIA Mellotron. The states of the various latent space features are

Multispeaker training (functioning, beneficial)
Torchmoji conditioning (functioning)
Audio/speaker embedding (functioning)
Pitch conditioning (non functioning)
SRMR and MOSNet conditioning (non functioning)

It also includes teacher forcing type methods for prosody matching, as well as language, tensorboard, and torchscript support, and improvements to the learning rate scheduling.

Usage

The easiest ways to try us out are the colab notebooks, but if you want to install, run

Installation

conda create -n 'uberduck-ml-dev' python=3.8
source activate uberduck-ml-dev
pip install git+https://github.com/uberduck-ai/uberduck-ml-dev.git

Training

Create your training config and filelists. Use the training configs in the configs directory as a starting point, e.g. this one.

(Optional) Download torchmoji models if training with Torchmoji GST.

wget "https://github.com/johnpaulbin/torchMoji/releases/download/files/pytorch_model.bin" -O pytorch_model.bin
wget "https://raw.githubusercontent.com/johnpaulbin/torchMoji/master/model/vocabulary.json" -O vocabulary.json

Start training. Example invocation for Tacotron2 training:

python -m uberduck_ml_dev.exec.train_tacotron2 --config tacotron2_config.json

Development

We love contributions! To install in development mode, run

pip install pre-commit black # install the required development dependencies in a virtual environment
git clone git@github.com:uberduck-ai/uberduck-ml-dev.git # clone the repository:
pre-commit install # Install required Git hooks:
python setup.py develop # Install the library

🚩 Testing

In an environment with uberduck-ml-dev installed, run

python -m pytest